ArticlePDF Available

Email Spam Filtering using Supervised Machine Learning Techniques

Authors:

Abstract and Figures

E-mail spam, known as unsolicited bulk Email (UBE), junk mail, or unsolicited commercial email (UCE), is the practice of sending unwanted e-mail messages, frequently with commercial content, in large quantities to an indiscriminate set of recipients. Spam is prevalent on the Internet because the transaction cost of electronic communications is radically less than any alternate form of communication. There are many spam filters using different approaches to identify the incoming message as spam, ranging from white list / black list, Bayesian analysis, keyword matching, mail header analysis, postage, legislation, and content scanning etc. Even though we are still flooded with spam emails everyday. This is not because the filters are not powerful enough, it is due to the swift adoption of new techniques by the spammers and the inflexibility of spamfilters to adapt the changes. In our work, we employed supervised machine learning techniques to filter the email spam messages. Widely used supervised machine learning techniques namely C 4.5 Decision tree classifier, Multilayer Perceptron, Naïve Bayes Classifier are used for learning the features of spam emails and the model is built by training with known spam emails and legitimate emails. The results of the models are discussed.
No caption available
… 
Content may be subject to copyright.
V. Christina et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 09, 2010, 3126-3129
Email Spam Filtering using Supervised Machine
Learning Techniques
V.Christina
#
, S.Karpagavalli
*
, G.Suganya
#
#
M.Phil Research scholar Department of Computer Science(PG)
P.S.G.R Krishnammal College for Women
*
Senior Lecturer
GR Govindarajulu School of Appiled Computer Technology
AbstractE-mail spam, known as unsolicited bulk Email
(UBE), junk mail, or unsolicited commercial email (UCE), is the
practice of sending unwanted e-mail messages, frequently with
commercial content, in large quantities to an indiscriminate set
of recipients. Spam is prevalent on the Internet because the
transaction cost of electronic communications is radically less
than any alternate form of communication. There are many
spam filters using different approaches to identify the incoming
message as spam, ranging from white list / black list, Bayesian
analysis, keyword matching, mail header analysis, postage,
legislation, and content scanning etc. Even though we are still
flooded with spam emails everyday. This is not because the
filters are not powerful enough, it is due to the swift adoption of
new techniques by the spammers and the inflexibility of spam
filters to adapt the changes. In our work, we employed
supervised machine learning techniques to filter the email spam
messages. Widely used supervised machine learning techniques
namely C 4.5 Decision tree classifier, Multilayer Perceptron,
Naïve Bayes Classifier are used for learning the features of
spam emails and the model is built by training with known
spam emails and legitimate emails. The results of the models are
discussed.
Keywords Spam, Spam filter, Spammer, Mail header,
Machine learning, Classifier
I. INTRODUCTION
The internet has become an integral part of everyday
life and e-mail has become a powerful tool for information
exchange. Along with the growth of the Internet and e-mail,
there has been a dramatic growth in spam in recent years.
Spam can originate from any location across the globe where
Internet access is available. Despite the development of anti-
spam services and technologies, the number of spam
messages continues to increase rapidly. In order to address
the growing problem, each organization must analyze the
tools available to
determine how best to counter spam in its
environment. Tools, such as the corporate e-mail system, e-
mail filtering gateways, contracted anti-spam services, and
end-user training, provide an important arsenal for any
organization. However, users cannot avoid the very serious
problem of attempting to deal with large amounts of spam on
a regular basis. If there are no anti spam activities, spam will
inundate network systems, kill employee productivity, steal
bandwidth, and still be there tomorrow.
II. S
PAM FILTER ARCHITECTURE AND METHODS
E-mail spam, known as unsolicited bulk Email (UBE), junk
mail, or unsolicited commercial email (UCE), is the practice
of sending unwanted e-mail messages, frequently with
commercial content, in large quantities to an indiscriminate
set of recipients. The technical definition of spam is ‘An
electronic message is "spam" if (A) the recipient's personal
identity and context are irrelevant because the message is
equally applicable to many other potential recipients; and (B)
the recipient has not verifiably granted deliberate, explicit,
and still-revocable permission for it to be sent’. The risks in
filtering spam are sometimes legitimate mails may be
rejected or denied and legitimate mails may be marked as
spam. The risks of not filtering spam are the constant flood
of spam clogs networks and adversely impacts user inboxes,
but also drain valuable resources such as bandwidth and
storage capacity, productivity loss and interfere with the
expedient delivery of legitimate emails.
Spam filters can be implemented at all layers, firewalls
exist in front of email server or at MTA(Mail Transfer
Agent), Email Server to provide an integrated Anti-Spam and
Anti-Virus solution offering complete email protection at the
network perimeter level, before unwanted or potentially
dangerous email reaches the network. At MDA (Mail
Delivery Agent) level also spam filters can be installed as a
service to all of their customers. At Email client user can
have personalized spam filters that then automatically filter
mail according to the chosen criteria. Figure 1. shows the
typical architecture of spam filter.
The several different methods to identify incoming
messages as spam are, Whitelist/Blacklist, Bayesian analysis,
Mail header analysis, Keyword checking. A whitelist is a
list, which includes all addresses from which the users
always wish to receive mail.
User can add email addresses or entire domains, or
functional domains. An interesting option is an automatic
whitelist management tool that eliminates the need for
administrators to manually input approved addresses on the
whitelist and ensures that mail from particular senders or
domains are never flagged as spam.
ISSN : 0975-3397
3126
V. Christina et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 09, 2010, 3126-3129
The number of records can be configured. When an
overflow occurs, obsolete records are overwritten. A blacklist
works similarly to competitive alternatives: this is a list of
addresses from which user never want to receive mail. Mail
header checking consists of a set of rules that, if a mail
header matches, triggers the mail server to return messages
that have blank "From" field, that lists a lot of addresses in
the "To" from the same source, that have too many digits in
email addresses (a fairly popular method of generating false
addresses). It also enables to return messages by matching
the language code declared in the header.
In Bayesian analysis, the word probabilities (also known
as likelihood functions) are used to compute the probability
that an email with a particular set of words in it belongs to
either category. This contribution is called the posterior
probability and is computed using Bayes' theorem. Then, the
email's spam probability is computed over all words in the
email, and if the total exceeds a certain threshold, the filter
will mark the email as a spam. Keyword checking is another
method widely used in filtering spam. It works by scanning
both email subject and body. Using "conditions" i.e.
combinations of keywords is a good solution to enhance
filtering efficiency. We can specify combinations of words
and update the list that must appear in the spam email. All
messages that include these words will be blocked.
III. M
ETHODOLOGY
Most of the spam filtering techniques is based on text
categorization methods. Thus filtering spam turns on a
classification problem. In our work, rules are framed to
extract feature vector from email. As the characteristics of
discrimination are not well defined, it is more convenient to
apply machine learning techniques. Three machine learning
algorithms, C 4.5 Decision tree classifier, Multilayer
perceptron and Naïve bayes classifier are used for learning
the classification model.
A. MultiLayer Perceptron
Multilayer Perceptron (MLP) network is the most widely
used neural network classifier. MLP networks are general-
purpose, flexible, nonlinear models consisting of a number of
units organised into multiple layers. The complexity of the
MLP network can be changed by varying the number of
layers and the number of units in each layer. Given enough
hidden units and enough data, it has been shown that MLPs
can approximate virtually any function to any desired
accuracy. In other words, MLPs are universal approximators.
MLPs are valuable tools in problems when one has little or
no knowledge about the form of the relationship between
input vectors and their corresponding outputs.
B. C 4.5 Decision Tree Induction
Decision Tree Classification generates the output as a
binary tree like structure called a decision tree, in which each
branch node represents a choice between a number of
alternatives, and
each leaf node represents a classification or
decision. A Decision Tree
model contains rules to predict
the target variable. This algorithm scales well, even where
there are varying numbers of training examples and
considerable numbers of attributes in large databases.
J48 algorithm is an implementation of the C4.5 decision tree
learner. This implementation produces decision tree models.
The algorithm uses the greedy technique to induce decision
trees for classification. A decision-tree model is built by
analyzing training data and the model is used to classify
unseen data. J48 generates decision trees, the nodes of which
evaluate the existence or significance of individual features.
C. Naïve Bayes Classification
The naive bayes classifier (NB) is a simple but effective
classifier which has been used in numerous applications of
information processing including, natural language
processing, information retrieval, etc. The Naive Bayes
Classifier technique is based on Bayesian theorem and is
particularly suited when the dimensionality of the inputs is
high. Naïve Bayes classifiers assume that the effect of a
variable value on a given class is independent of the values
of other variable. The Naive-Bayes inducer computes
conditional probabilities of the classes given the instance and
picks the class with the highest posterior. Depending on the
precise nature of the probability model, naive Bayes
classifiers can be trained very efficiently in a supervised
learning setting.
IV.
FEATURE EXTRACTION
The work is based on rules and uses a score-based system.
The rules are framed by analyzing the mail header
information, keyword matching and the body of the message.
And a relative score is assigned to each rule.
ISSN : 0975-3397
3127
V. Christina et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 09, 2010, 3126-3129
There are number of rules framed by considering the various
features that will aid to identify the spam messages
effectively. Each rule performs a test on the email, and each
rule has a score. When an email is processed, it is tested
against each rule. For each rule found to be true for an email,
the score associated with the rule is added to the overall score
for that email. Once all the rules have been used, the total
score for the email is compared to a threshold value. If the
score exceeds the threshold, then the email is marked as
spam and the others are classified as legitimate mail. In this
work, the rules used are
TABLE I
SCHEME OF RULES ASSIGNED TO EACH SPAM FEATURE
From name meaningful
From domain name
Blocked IP
Apostrophe in From name
From name in Auto Whitelist (AWL)
From address in User’s Block list
From address in User’s White list
Content Type
Content Boundary exists
To name meaningful
To address Undisclosed recipients
To header original
From address and To address same
Is subject present
Subject content has obfuscate words
Is forwarded message
Is reply message
Subject Reply without reference header
Is message body exists
Sensual message
Repeated double quotes in body
Character set includes foreign language
More blank lines in body
In these 23 rules, some are simple and some are
associated with one another. A simple rule could search for a
word ‘Viagra’ in subject line of an email, while a complex
rule may involve comparing an email against an online
database of spam. Each rule adds to the overall score, so an
email that triggers only one rule due to the use of the word
‘Viagra’ will not necessarily mark an email as spam.
However, if an email triggers several rules, it will have a
combined score that could be over the threshold and the mail
could be marked as spam.
V. E
XPERIMENT AND RESULTS
The email spam filtering has been carried out using
WEKA. The Weka, Open Source, Portable, GUI-based
workbench is a collection of state-of-the-art machine learning
algorithms and data pre processing tools.
The training dataset, spam and legitimate message corpus
is generated from the mails that we received from our
institute mail server for a period of six months. The mails are
analyzed and 23 rules are identified that extremely ease the
process of classifying the spam message. The corpus consists
of 750
spam messages and 750 legitimate messages. From
the corpus, the feature vectors are extracted by analyzing
message header, keyword checking,
whitelist/blacklist etc.
The class labels are designated as L and S to represent
legitimate and spam
message respectively.
The machine learning techniques Naïve Bayes Classifier,
C 4.5 Decision tree classifier, Multilayer Perceptron are used
for training the dataset in WEKA environment.
The training is carried out with the feature vectors
extracted by analyzing each message header and keyword
checking and whitelist/blacklist.
The performance of the trained models is evaluated
using 10-fold cross validation for its predictive accuracy.
Predictive accuracy is used as a performance measure for
email spam classification. The prediction accuracy is
measured as the ratio of number of correctly classified
instances in the test dataset and the total number of test cases.
In spam filtering, false negatives just mean that some spam
mails are classified as legitimate and moved to inbox. False
positive mean that legitimate emails that get mistakenly
identified as spam and moved to spam folder or discarded.
For most users, missing legitimate email is an order of
magnitude worse than receiving spam. The false positive rate
of each classifier also considered to measure its performance.
The performance of the classifiers are summarized in
Table II and shown in Fig.2 and Fig.3.
TABLE II
COMPARATIVE RESULTS OF THE CLASSIFIERS
Evaluation
Criteria
Naïve
Bayes
J48 MLP
Training time (secs) 0.15 0.20 138.05
Correctly Classified
Instances
1479 1449 1490
Prediction
Accuracy ( % )
98.6 96.6 99.3
False Positive (%) 5 4 1
Fig. 1 Classification Accuracy
The performance of the three models was evaluated
based on the three criteria, the prediction accuracy,
learning time and false positive rate. Multilayer
perceptron predicts better than other algorithms
.
99.3
96.6
98.6
95
95.5
96
96.5
97
97.5
98
98.5
99
99.5
100
Nve Bayes J48 MLP
Accuracy%
ISSN : 0975-3397
3128
V. Christina et al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 09, 2010, 3126-3129
0.15
0.2
138.05
0
20
40
60
80
100
120
140
Build Time(sec)
Nve bayes J48 MLP
Fig. 1 Learning Time of the Models
Multilayer perceptron, the neural network classifier
consumes more time to build the model. The naivebayes, the
probabilistic classifier and decision tree model tends to learn
more rapidly for the given data set.
VI.
CONCLUSION
Although there are many email spam filtering tools exists
in the world, due to the existence of spammers and adoption
of new techniques, email spam filtering becomes a
challenging problem to the researchers. In our work, we
generated spam and legitimate message corpus from the
latest mails and employed machine learning techniques to
build the model. The performance of the model is evaluated
using 10-fold cross validation and observed that Multilayer
Perceptron classifier out performs other classifiers and the
false positive rate also very low compared to other
alogorithms. Email spam filters using this approach can be
adopted either at mailserver or at mail client side to reduce
the amount of spam messages and to reduce the risk of
productivity loss, bandwidth and storage usage.
REFERENCES
[1] Ahmed Khorsi, "An Overview of Content-based Spam Filtering
Techniques", Informatica, vol. 31, no. 3, October 2007, pp 269-277.
[2] Alistair McDonald, “SpamAssassin: A Practical Guide to Integration
and Configuration”, I
st
Edition, Packt publishers, 2004.
[3] Ian H. Witten, Eibe Frank, “Data Mining – Practical Mahine Learning
Tools and Techniques,” 2
nd
Edition, Elsevier, 2005.
ISSN : 0975-3397
3129
... In content based methods ML Classifiers like a Naive Bayesian classifier, Neural Networks, SVM, and knearest neighbor are often utilized to make automatic filtering criteria and categorize messages employing content focused methods. This approach assesses terms, their occurrence, also the distributions in mail text before employing developed algorithms to filter incoming spam emails [5]. Heuristic Methods uses previously devised criteria or heuristics, In this technique evaluates a great number of different patterns that often are regular expres-sions, in contrast to a specified text. ...
... Spam is defined as any communication with a percentage higher than a specific threshold otherwise, It is considered legitimate. While certain scoring rules do not change and evolve, others must be changed periodically to ad-dress the threat of spammers who frequently launch new spam emails that can easily in-filtrate through email filters [5]. Spam Assassin [6] is a nice example of a rule-focused spam filter.cj ...
... The data is then divided into 2 vector groups. Lastly, to identify whether coming emails are spam or ham [5]. While Adaptive technique for spam email detection and filtration are done by classifying it into several categories. ...
Preprint
Full-text available
Since the advent of email services, spam emails are a major concern because users’ security depends on the classification of emails as ham or spam. It’s a malware attack that has been used for spear phishing, whaling, clone phishing, website forgery, and other harmful activities. However, various ensemble Machine Learning (ML) algorithms used for the detection and filtering of spam emails have been less explored. In this research, we offer a ML based optimized algorithm for detecting spam emails that have been enhanced using Hyper-parameter tuning approaches. The proposed approach uses two feature extraction modules, namely Count-Vectorizer and TFIDF-Vectorizer that provide the most effective classification results when we applied them to three different publicly available email data sets: Ling Spam, UCI SMS Spam, and Proposed dataset. Moreover, to extend the performance of classifiers we used various ML methods such as Naive Bayes (NB), Logistic Regression (LR), Extra Tree, Stochastic Gradient Descent (SGD), XG-Boost, Support Vector Machine (SVM), Random Forest (RF), Multi Layer Perception (MLP), and parameter optimization approaches such as Manual search, Random search, Grid search, and Genetic algorithm. For all three data sets, the SGD outperformed other algorithms. All of the other ensembles (Extra Tree, RF), linear models (LR, Linear-SVC), and MLP performed admirably, with relatively high precision, recall, accuracies and F1-Score.
... The reason to filter spam e-mails is to save money, time, and resources consumed by spam e-mails. E-mail spam filters can be implemented on mail servers or clients to filter and minimize spam e-mails and avoid bandwidth mutilation [2,3,4,5]. Spammers shift subjects and make concerns appealing to people. Over time, the spam e-mail industry becomes increasingly complex and unexpected for spam e-mail filters [7]. ...
... The objective of developing an anti-spam approach is to conserve crucial resources that must be used for the primary function of e-mail, which is communication rather than advertisement and marketing [4,8]. Misclassification occurs when a spam e-mail is misidentified as a valid e-mail and vice versa. ...
Preprint
Full-text available
This paper proposes a content-based spam email classification by applying various text preprocessing techniques like Stopping, Stemming, and Lemmatization. NLP techniques have been applied to preprocessing techniques to avoid loss of information while preprocessing. Machine learning algorithms like Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest (RF) are applied to classify an email as ham or spam. The pre-processing technique used in this model has greatly enhanced the classification results. Two standard datasets Enron and SpamAssassin are used to evaluate the performance of the models. On the Enron dataset, it scored an impressive accuracy of 98.3%, and on the SpamAssassin dataset, it provided an even greater accuracy of 99.2%. These outcomes demonstrate the efficacy of preprocessing techniques for the classification of spam emails. Further, the proposed model's accuracy was validated using a dataset of personal emails sourced from Yahoo mailbox. The Yahoo inbuilt classifier offered an accuracy of 89%, however, the proposed models provided a staggering 97% accuracy on the personal email dataset. The experiment on the personal email dataset indicates the model's suitability for real-world email contexts, indicating its potential effectiveness in spam email categorization.
... Spammers often exploit email attachments and packed URLs to lure users into online scams. Despite the availability of keyword-based filtering rules, spam filters face challenges in effectively blocking spam emails [51]. ...
Article
Full-text available
Recent research indicates a notable surge in SMS spam, posing as entities aiming to deceive individuals into divulging private account or identity details, commonly termed "phishing" or "email spam". Conventional spam filters struggle to adequately identify these malicious emails, leading to challenges for both consumers and businesses engaged in online transactions. Addressing this issue presents a significant learning challenge. While initially appearing as a straightforward text classification problem, the classification process is complicated by the striking similarity between spam and legitimate emails. In this study, we introduce a novel method named "filter" designed specifically for detecting deceptive SMS spam. By incorporating features tailored to expose the deceptive techniques employed to dupe users, we achieved an accurate classification rate of over 99.01% for SMS spam emails, while maintaining a low false positive rate. These results were attained using a dataset comprising 746 instances of spam and 4822 instances of legitimate emails. The filter's accuracy, evaluated on a dataset with two attributes and 5568 instances, notably surpasses existing methodologies. Our proposed model, a Hybrid NB-ANN model, achieves the highest accuracy at 99.01%, outperforming both Naïve Bayes (98.57%) and Artificial Neural Network (98.12%). This highlights the efficacy of the hybrid approach in enhancing accuracy for email spam detection and malware filtering, ensuring comprehensive coverage across training and test datasets for improved feedback loops.
... The most widely used and advanced methods of machine learning used in spam filtering is the Naïve Bayes Classification, which is a filtering method based on content. This method normally analyses words, the occurrence, and distributions of words and phrases in the content of emails and used then use generated rules to filter the incoming email spams [2]. Another common approach of classifying spam is knowledge engineering. ...
Article
Full-text available
With the rising number of spam email, the need of more sufficient antispam filter is surging. Phishing attack can lead to extremely large losses of companies and individual, even more than 1 billion dollars in one year. This paper investigates and combines Naïve Bayes Classification and clustering algorithm in the application of identifying spam emails. With sample emails to create a dynamic dictionary containing most frequent words in spam and normal emails, this distribution of spam filter will provide a stricter method to prevent spam emails than those methods used in mail companies, e.g., Google, Yahoo, and Outlook.com. Besides, this paper also compares several algorithms used today in classifying spams and the future techniques of deep learning and machine learning’s application in classifying spam emails. According to the analysis, Google’s algorithm has the most comprehensive function, but such algorithm has less strict rule than Yahoo’s. Outlook.com, as a combination of Microsoft application, it has a unique algorithm for encrypting and filtering spams. Overall, these results shed light on guiding further exploration of both comprehensive and strict rule for classifying spams.
... The threat posed by spam to users is increasing year on year, with spam accounting for over 77% of global email traffic. Spam wastes unnecessary reading time, takes up storage space in mailboxes, and consumes communication bandwidth and CPU power [1]. This thesis provides a systematic analysis and study of the characteristics of spam. ...
Article
Full-text available
As technology continues to advance, email is used in almost every field. However, the dramatic increase in the number of spam emails has led to a growing need for accurate and powerful spam classifiers. Unfortunately, spam in its many forms is constantly being updated as the Internet evolves, and the challenge of fighting spam is enormous. With the emerging of deep learning methods, deep based algorithms are widely applied in most of the real-world scene and tasks. In this paper, we used CNN, RNN, LSTM and naïve Bayes models to implement spam filtering, and compared them based on information such as accuracy, model strengths and weaknesses. We test all the models on the public dataset and measure them by four metrics, including accuracy, precision, recall and F1-score. Finally, we conclude that the naïve bayes model achieves the best performance among all those methods, which can deal with the threat of spam efficiently.
... This study discusses numerous academic attempts to solve the spam problem using machine learning techniques. Support Vector Machine has been utilised in another suggested system [10] to distinguish between spam and ham e-mails. Utilising feature selection to implement Ant Colony Optimization yielded more precise and promising results. ...
Article
Employing individuals via the Internet has been a boon for businesses in the modern day. It is much simpler and more convenient than traditional recruitment methods. However, several scammers are abusing this platform, which may result in financial and privacy loss for job seekers and damage to the reputable organisation's name. In this research, we proposed a technique for detecting Online Recruitment Fraud (ORF). This model uses a publicly available dataset containing 17,780 job postings. We apply the four classification models to determine which classification model performs best for our suggested model. In this model, we use decision trees, random forests, Naive Bayes and logistic regression methods. We have estimated and evaluated the accuracy of several prediction systems. The random forest classifier provides the greatest accuracy, 97.16%, on our dataset. We have endeavoured to develop a method for detecting bogus recruiting postings.
Chapter
At present, email has become a necessary communication medium for exchanging messages and is considered an important part of business, commerce, government, education, entertainment, and other fields in various countries. As it is known that everything has its pros and cons, in spite of benefitting society everybody have to face its drawbacks, which include email spamming. Email spam or junk emails are unwanted emails which are annoying as well as dangerous, containing links trying to corrupt the computer system, stealing your bank details, and stealing your identity with the help of Botnet or real humans, fraudulent access to the information through data breaches. Spam filtering algorithms detect undesired, infected mail and block these messages from reaching to user’s inboxes and keeping their email servers safe from getting overloaded. These are adaptable and provide sustainability to all the spam detected and provide security to the emails. It is important to make the network and system to be free from spammers, malicious links, and viruses. In this research, it has been demonstrated that the K-Nearest Neighbor algorithm by storing the training data and the class labels and is used for classification as well as regression and Random Forest containing more than one decision tree on different subsets improves the accuracy and achieve high accuracy rates, often above 95% by classifying email into spam and ham.
Chapter
Artificial Intelligence and Data Science in Recommendation System: Current Trends, Technologies and Applications captures the state of the art in usage of artificial intelligence in different types of recommendation systems and predictive analysis. The book provides guidelines and case studies for application of artificial intelligence in recommendation from expert researchers and practitioners. A detailed analysis of the relevant theoretical and practical aspects, current trends and future directions is presented. The book highlights many use cases for recommendation systems: - Basic application of machine learning and deep learning in recommendation process and the evaluation metrics - Machine learning techniques for text mining and spam email filtering considering the perspective of Industry 4.0 - Tensor factorization in different types of recommendation system - Ranking framework and topic modeling to recommend author specialization based on content. - Movie recommendation systems - Point of interest recommendations - Mobile tourism recommendation systems for visually disabled persons - Automation of fashion retail outlets - Human resource management (employee assessment and interview screening) This reference is essential reading for students, faculty members, researchers and industry professionals seeking insight into the working and design of recommendation systems.
Book
At present, we live in a self-motivated and dynamic global society where technologies and challenges are unexpectedly changing overnight. These rapid changes in globalization and technological advances are creating new market forces every day. Therefore, day-to-day innovation is essential for any business or institution to survive and flourish in such an atmosphere. Though, innovation is no longer just to create value to do good to individuals, societies, or organizations. The utmost purpose of innovation is to create a smart futuristic society where people can enjoy the best quality of life using natural resources and manmade technologies including cloud-IoT technologies, and industry4.0. Hence, the innovators and their innovations must search for intelligent solutions to tackle major socio-technical problems and remove barriers of rural, urban and smart city societies. This book provides in-depth knowledge in the areas of convergence of cloud-IoT technologies and industry 4.0 with society 5.0, machine-to-machine communication, machine-to-person communication, techno-psychological perspective of society 5.0, sentiment analysis of smart digital societies, multi-access edge computing for 5G networks, discovery & location reporting of multi-access edge enabled clients/servers, m-health systems, enhancing the concert of M-health technologies in smart societies, supervising communication services in smart societies, life quality enhancement in smart city societies, multiple disease infection predictions, and societal opinion mining algorithms for smart cities societies using cloud-IoT integrated intelligent machine / deep learning technologies to the readers in the distributive environment. In this book, the authors have mandatorily discussed the implementation of cloud-IoT-based machine learning technologies like clustering technique, Naïve Bayes classifier, artificial neural network (ANN), Firefly algorithm, Rough set classifiers, support vector machine classifier, decision tree classifier, ensemble classifier, random forest, and deep learning algorithms to analyze the behavior of intelligent machines and human habits using automated data scheduling and smart digital networks. Smart digitization and the intelligent implementation of manufacturing development processes are the necessities for today’s rural, urban, and smart city industries. All types of industries including development, manufacturing, and research are presently shifting from bunch production to customized production. The fast advancements in manufacturing technologies have an in-depth impact on all types of societies including societies of rural areas, urban areas, and smart cities. Industry 4.0 includes the Internet of Things (IoT), Industrial Internet, Smart Manufacturing, Cloud-based computing, and Manufacturing Technologies. The objective of this book is to establish a linkage between the Industry 4.0 components and various rural, urban & smart city societies (including society 5.0) to bring actual prosperity where human values, peace of mind, human relations, man-machine-relations, and calmness will have utmost preference. These objectives can be achieved by the integration of human societal values, and social opinion mining (SOM) approaches with the existing technologies.
Chapter
Full-text available
The rapid growth in the chunk of social media messages has generated the dire need to architecture more reliable and robust techniques to mine the social opinion with the help of social media messages related to a particular realm. Machine learning techniques of recent scenarios can successfully be implemented for this pursuit. In this chapter, the authors presented a critical review and performance of different machine learning approaches for social opinion mining (SOM). Indeed, this chapter presents a beautiful amalgamation of different aspects related to SOM such as indispensable and inevitable concepts, implementation details, and efficiency to mine the opinion of society with the help of social media messages in the light of machine learning techniques. The basic discussion and theoretical background of this chapter attempt to explore the application of machine learning strategies in the phenomenon of SOM. Moreover, the chapter renders the discussion on the usual opinion mining process and the attempts performed by different researchers to tackle the pertinent problems of SOM with the help of social media and the implementation of machine learning techniques. Moreover, the chapter also discusses different challenges associated with this task. The mathematical models of social opinion mining are also proposed based on fuzzy concepts and linear algebra. Further, this chapter entails the algorithmic details of ten machine learning techniques for SOM. Finally, experimentations have been carried out for these ten machine learning techniques using Google embeddings and Amazon embeddings on two datasets and it has been observed that convolutional neural network (CNN) based deep learning outperforms other machine learning techniques in the pursuit of SOM.KeywordsOpinion miningSocial opinion miningMachine learningSocial networkWord embeddings
Article
So fast, so cheap, so efficient, Internet is nowadays incontestably communication mean of choice for personal, business and academic purposes. Unfortunately, Internet has not only this beautiful face. Malicious activities enjoy as well this so fast, cheap and efficient mean. The last decade, Internet worms took the lights. In the recent years, spams are invading one of the most used services of Internet: email. This paper summarizes most of techniques used to filter spams by analyzing the email content.
SpamAssassin: A Practical Guide to Integration and Configuration
  • Alistair Mcdonald
Alistair McDonald, "SpamAssassin: A Practical Guide to Integration and Configuration", I st Edition, Packt publishers, 2004.