ChapterPDF Available

Detection of Users’ Abnormal Behavior on Social Networks

Authors:

Abstract and Figures

In just a few years, social networking sites have become the most popular landmarks on the Internet. They revolutionized the way we communicate, and socialized the Web. However, while it is now impossible to deny their impact, it can take a variety of forms, not all of them are positive. As a result, the detection of anomalies on social networks is a topic of current research that has attracted researchers since the 2000s. This problem is of crucial importance to prevent abnormal activities. So far, all existing works have been devoted to one-dimensional networks. Our approach attempts to provide a new anomaly detection method based on examining relationships between OSN users using multidimensional networks.
Content may be subject to copyright.
Detection of Users’ Abnormal Behavior
on Social Networks
Nour El Houda Ben Chaabene1,2(B
), Amel Bouzeghoub1, Ramzi Guetari3,
Samar Balti3, and Henda Hajjami Ben Ghezala2
1SAMOVAR, Telecom SudParis, 19 Place Marguerite Perey, 91120 Palaiseau, France
{Nourelhouda.Benchaabene,Amel.Bouzeghoub}@telecom-sudparis.eu
2RIADI, National School of Computer Science,
Campus Universitaire de la Manouba, 2010 Manouba, Tunisia
{Nourelhouda.Benchaabene,Henda.Benghezala}@ensi.rnu.tn
3LIMTIC, Higher Institute of Computer Science, 2 Rue Abou Rayhane Bayrouni,
2080 Ariana, Tunisia
Ramzi.guetari@isi.utm.tn, Samarbaltia@gmail.com
Abstract. In just a few years, social networking sites have become the
most popular landmarks on the Internet. They revolutionized the way we
communicate, and socialized the Web. However, while it is now impos-
sible to deny their impact, it can take a variety of forms, not all of them
are positive. As a result, the detection of anomalies on social networks
is a topic of current research that has attracted researchers since the
2000s. This problem is of crucial importance to prevent abnormal activ-
ities. So far, all existing works have been devoted to one-dimensional
networks. Our approach attempts to provide a new anomaly detection
method based on examining relationships between OSN users using mul-
tidimensional networks.
1 Introduction
Social Networks are an ambiguous notion, because in the strict sense, an online
social network refers to the different relationships that people have between
themselves and the way in which they are structured; these different relation-
ships help us to understand the behavior of individuals. While today, people
are accustomed to using the notion of social network to designate an application
dedicated to communication or more specifically a social networking service that
through the Internet can maintain communication with their families, friends or
coworkers, and also enable them to meet new people. An Online Social Net-
work (OSN) is an advantage for those who want to learn, improve their culture,
discover new areas or communicate. However, OSN exposes its users to a mul-
titude of dangers, especially young ones who are the most vulnerable because
they do not have the hindsight or the experience to discern a risky situation or
a potentially harmful content.
c
Springer Nature Switzerland AG 2020
L. Barolli et al. (Eds.): AINA 2020, AISC 1151, pp. 617–629, 2020.
https://doi.org/10.1007/978-3-030-44041-1_55
618 N. E. H. Ben Chaabene et al.
Several methods have been addressed in the literature to solve the prob-
lem of anomaly detection in OSN. Two important techniques were discussed:
behavior-based anomaly detection and structure-based anomaly detection. The
behavioral approach is based on the analysis of the user’s behavior; it deals with
how interactions occur between pairs of users and other users in the system. In
contrast, the structural approach focuses primarily on a particular type of net-
work structure in the social network graph. The structural properties of a graph
show the importance of the structural approach in relation to the behavioral
approach [1]. Several works have addressed the problem of detecting anomalies
in OSN by using monodimensional graphs where the nodes are interconnected
by a single type of link. Given the evolution and synchronization of social net-
works, two users can have more relationships on several social networks at the
same time. In this context, the major advantages of the representation of the
links between the users by a multidimensional graph represent themselves by:
(1) the clarification of the type of link between users given the increase in the
communication rate, (2) the reduction of information loss and (3) the collection
of more information about each user.
The rest of the paper is as follows: Sect. 2presents a literature review of the
anomaly detection over OSN. Section 3describes our approach to detect abnor-
mal behaviors by examining users interactions on a multidimensional network.
Section 4illustrates an evaluation of the results obtained before concluding in
Sect. 5.
2 State of the Art
Abnormal activities in social networks are deviations from usual and legal activ-
ities. The detection of abnormal behavior is defined in the literature by the
detection of a surprising data : a situation in which a point belonging to class A
but in reality is placed in class B [16]. Hawkins [7] defined anomaly detection
as an observation that deviated so much from other observations that it arouses
the suspicion that it was generated by a different mechanism. Kaur and Singh [8]
considered anomaly detection as analogous to novelty detection in which new
patterns are observed in the data. The researchers examined a significant number
of solutions to detect anomalies in OSN. They described the methods using two
techniques : behavior-based techniques [912] and structure-based techniques
[1318]. In structure-based approaches, researches have focused on a particular
type of structure in the social network graph created due to abnormal activities.
The results of these researches have shown the important role of the structural
properties in relation to the properties extracted from the user’s behavior [1],
as well as the efficiency of the collected functionalities of the graph topology
structure [14]. In this context, we are only interested in works based on the
exploitation of the graphs, because it is sufficient to detect abnormal users only
using topological functionalities of social networks. In this section, we summa-
rize some of these computing solutions developed for the analysis, detection and
prediction of users’ behaviors on social networks.
Detection of Users’ Abnormal Behavior on Social Networks 619
In Akoglu et al. [16], the OddBall algorithm presents a fast and unsuper-
vised method for detecting anomalous nodes in weighted graphs by mentioning
the appropriate rules to eliminate before classifying a node as an anomaly. This
algorithm detects the deviation of the abnormal behavior from a known nor-
mal behavior. The question that arises to this effect is: what is a known normal
behavior? A normal behavior in 2019 may not be a normal behavior in 1970. The
change of the same behavior over time does not show the effectiveness of Odd-
Ball especially that the graphs tested on this algorithm are not the time evolu-
tion graphs. Hassanzadeh et al. [17], recommended a new Framework based on
the calculation of a certain number of measures (ego, egonet, super egonet, cen-
trality, community, etc.) of a graph. This Framework aims to detect the general
appearance of a model followed by most nodes, then it calculates an aberrant
score of each node based on the distance of the adjustment line to distinguish
the users who may be abnormal and finally it calculates a threshold to mini-
mize the number of false negatives and the false positives’ rate. This work uses
the fact that social networks have a community structure. This proves that the
majority of users belong to a small number of communities. On the other hand,
users with abnormal behavior resort to establishing random relationships with
users belonging to different communities. This method has been applied to static
datasets from online social networks. Rezaei et al. [18] proposed a methodology
based on the calculation of graph metrics. This methodology uses the OddBall
anomaly measurement formula [16], which is an effective way to detect abnormal
behavior in OSN. This work managed to tag 100 nodes with a high probability
that these nodes could follow an abnormal pattern. For this purpose, the results
obtained proved that the abnormal behaviors have a small number of mutual
friends with their friends. Time constraints are lacking to add dynamism to this
method. Fire et al. [14] suggested an algorithm for the detection of spammers
and fake profiles in social networks. This algorithm takes into consideration the
communities built following the relationships between the users. It assumes that
abnormal users log randomly to other users belonging to different communities.
Is it enough that this solution is based solely on the analysis of the topology of
the structure of social networks? The evaluation of the algorithm relies on its
execution on different structures of static graphs of social networks. Zheleva et
al. [15] presented a Framework for the prediction of the type of relationships
between users of social networks. This work relies on the combination of social
networking and affiliation links to instantiate sub-graphs of friendship and fam-
ily whose purpose is to identify dynamic anomalies anticipating future events.
Results from three social media sites validated the effectiveness of the proposed
framework. This method can only be applied to datasets where the links to the
groups are predefined. Chen et al. [26], developed an algorithm for anomaly
detection based on building communities in a dynamic network. This evolution-
ary algorithm has shown the effectiveness of communities in dynamic networks
compared to an unrepresentative algorithm on static networks. For this purpose,
communities are the most important metric extracted from a graph. This par-
ticular approach allowed the detection of six types of possible community-based
anomalies in evolutionary networks.
620 N. E. H. Ben Chaabene et al.
The previous works were developed for a common ob jective that is the detec-
tion of the anomalies in the graphs of the social networks. We can classify these
methods into four subclasses: (1) methods based on static graphs without the
construction of communities [16,18], (2) methods based on static graphs with
the construction of communities [14,17], (3) methods based on dynamic graphs
without the construction of communities [15] and (4) methods based on dynamic
graphs with the construction of communities [26]. The evolutionary structure of
social networks and the deviation of the user’s behavior over time require a
method based on calculating communities using a dynamic graph. In literature,
authors in [26] offered a single and specific work, which meets these two require-
ments.
All these analyzed works studied information networks in the one-dimensional
context where the nodes are interconnected by a single type of link. As a result,
complex information networks represent the actual data. Optimizing the com-
plexity of interactions in one type of link reduces the richness of communication
and leads to a considerable loss of information. Chouchane [19] similarly focused
his vision on solving the problem of the detection of atypical nodes. His work is
based on the use of a particular type of graph: a multidimensional graph. The
proposed method uses a static multidimensional graph where the nodes represent
the users and the edges represent the relations between the users in the different
dimensions of the graph. The elaborate solution addresses the hypothesis that
an anomaly is a node sparsely connected to other nodes of the network, in all
dimensions. As a result, nodes with common neighbors receive a high AS(u)
score versus randomly connected nodes that do not share enough neighbors with
the rest of the network nodes. Thus, nodes with atypical connections will have
low scores compared to densely connected nodes. To avoid the problem of the
choice of anomalous nodes according to the weak score, the distribution of Beta
[19,20] is used to classify anomalies automatically. Because this method is based
on the assumption that an anomaly is a node with sparse connections that do
not belong to any dense region in all dimensions of a multidimensional network,
a node can be separated from other nodes in the space because it may be new on
the network, which means that it does not have many links. Thus, this solution
deals with static networks and does not take into consideration the structure of
the network that forces the construction of user communities.
The need for anomaly detection, as well as the limited work presented pre-
viously in multidimensional networks [19], motivated our interest for the devel-
opment of a method that deals with the problem of anomaly detection.
3 Anomaly Detection Method
Nowadays, social networks are multiplying and they are sometimes difficult
to manage. The same user can have multiple accounts on different social net-
works. However, synchronization between these different accounts is necessary.
For example, synchronization allows the user to publish a photo simultaneously
via Instagram and Facebook in one click. A little work has been done in the
Detection of Users’ Abnormal Behavior on Social Networks 621
field of anomaly detection on multidimensional networks, so the lack of papers
clearly describes the difficulty that arises in this field. We present in this section
a method for detecting atypical nodes based on the analysis of the topology of
a multidimensional graph.
3.1 Notation
In our approach, we inspire the notation used in [21] to analyze the structure
of a multidimensional graph. An undirected multi-graph G is defined by the
triplet (V,E, D) where Vis a set of nodes, Eis a set of edges, and Dis a set
of dimensions. An edge eEis a triplet (u, v, d) where u, v Vare nodes and
dD={Twitter, Facebook, Instagram,...}is a dimension. The triplet (u, v, d)
specifies that nodes uand vare connected by an edge that belongs to dimension
d. Figure 1shows an example of a multidimensional networks.
Fig. 1. Example of multidimensional networks
Local graph’s properties must be used to help us detect atypical nodes. These
properties designate a single node (an ego) and its neighborhood at a first level
(an egonet). As mentioned earlier, our approach operates in three phases: (1)
detection of communities in the different dimensions of the graph, (2) estimation
of an anomaly score for each node and (3) automatic classification of estimated
anomaly scores via the Beta distribution.
3.2 Phase 1: Detection of Communities in Different Dimensions
The goal of community detection is to cluster the nodes in the graph into groups
that share common characteristics. This provision is true in the context of online
social networks [22]. Social networks’ users behave in the way of forming com-
munities based on their preferences and common interests. A range of techniques
has been presented in various works to address this general problem. An inter-
esting work [17] showed that the contribution of community detection appeared
in the usefulness of information extracted from the structure of communities
formed. This information facilitates the analysis of a user’s behavior and allows
622 N. E. H. Ben Chaabene et al.
the identification of an abnormal behavior. In [16], the authors defined that an
egonet(u) forms a community with the egonet(v) if at least half of the nodes of
the smaller egonet connect to the other egonet. The application of Eq. (1)[17]
allows us to calculate the communities of a graph.
Com(u, v)=
1, if deg ree (u, v)norm min (|u|,|v|)/2
0, otherwise
(1)
knowing that:
Equation (2): Normalized external degree
degree(u, v )norm =degree(u, v)
min(|u|,|v|)(2)
Equation (3): External degree of egonet(u) to egonet(v)
degree (u, v)=
Vegonet(u)Vegonet(v)
+
uv E:uVegonet(u),v Vegonet(v)
,u,v G
(3)
Where Vegonet(u)is the set of nodes of egonet(u)andVegonet(v)is the set of
nodes of egonet(v).
3.3 Phase 2: Anomaly Score Estimation
In this section, we developed a method that estimates an anomaly score between
0 and 1 for each node of the multidimensional network to make the right deci-
sion about the nature of the user’s behavior. First, we calculated the anomaly
score AS(u) of each node in each dimension di. Then, we calculated two other
scores DE(u)andnbct(u) in order to specify the influence of the node uon the
nodes belonging to its commonality. Finally, a total anomaly score AST (u)is
estimated.
Step 1: We started by calculating the anomaly score of each node in each
existing dimension in our multidimensional network. The node can have a
score of: 0, 1 or 0.5. This score is attributed according to the influence of the
node on its community (see Eq. (4)).
AS(u)di=
1,if(uCom)and (u influences the construction of the communauty)
0.5,if(uCom)and (u does not influence the construction of the communauty)
0,if(u/Com)and (udi)
(4)
A question that arises in the first case is; how can a node strongly influence
the construction of the community? The answer requires the calculation of
two scores. The first score DE(u) denotes the distance between the ego(u)
and the ego(v), and the second score nbct(u) represents the total number of
direct links from ego(u)toego(v). The Eqs. (5), (6) and (7) respectively show
the way of calculating the three scores DE(u), nbct(u)andnbc(u)ineach
dimension.
DE(u)di=number of possible outgoing links f rom the node(u)to all the
nodes of the C om(u)number of outgoing link s fr om node(u)to its neighbors (5)
Detection of Users’ Abnormal Behavior on Social Networks 623
nbct(u)di=nbc(u)di
number of nodes that form the Com(u)(6)
knowing that:
nbc(u)di=number of direct links f rom the egonet(u)to the egonet(v)(7)
The comparison of the two scores DE(u)andnbct(u) allows us to deduce the
degree of influence of the node(u) on the nodes of its community. So, if the
score DE(u) is greater than or equal to the score nbct(u) then the node(u)has
a relation of average degree with the nodes which belong to its community.
And if the score DE(u) is lower than the score nbct(u) then the node(u)
is strongly connected to the nodes that form its community as expressed in
Eq. (8).
Step 2: The computation of the total anomaly score of each node uis expressed
as a function of the sum of the anomaly scores of the node uin each domain,
and the number of domains where the node uexists. The Eq. (9) represents
the formula for calculating the total anomaly score for each node.
AS(u)di=
if (uCom)then
1,DE(u)<nbct(u)
0.5,DE(u)nbct(u)
0, otherwise
(8)
AST (u)= AS(u)di
number of domains where (u)exists (9)
3.4 Phase 3: Automatic Detection of Abnormal Behavior
In the literature, there are two frequent solutions to the problem of classification.
A first solution is to rank the scores AST (u) by increasing rank by selecting the
first or last knodes. However, the main difficulty of this solution is to specify the
value of kappropriate for all datasets, which can lead to an error. A second solu-
tion is to set a separation threshold between AST (u) scores. The identification
of an appropriate threshold for all types of data is a difficult task. To remedy
this, we use the mixing model of the Beta law which is an effective way to solve
this type of problem [2325]. It is then sufficient to determine the conditions
of applying the mixture model probability for the recognition of the behavior’s
nature.
The Beta distribution admits adaptability and flexibility to model complex
and variable situations unlike other statistical distributions [27]. For example, the
Gaussian distribution only allows us to model symmetric modes, which express
the possibility of obtaining a less adequate modeling of the data [28]. The Beta
distribution is also characterized by modeling on various forms: the U form,
the L form, and the form of a line [27], which shows its strong adaptability for
accurate modeling of our anomaly scores.
624 N. E. H. Ben Chaabene et al.
The automatic anomaly detection phase requires the application of two algo-
rithms: (1) the estimation of the optimal number of components (p)oftheBeta
law algorithm and (2) the automatic identification of abnormal nodes algorithm
[19]. These two previously mentioned algorithms can be summed up in three
steps: (1) the estimation of the parameters of a component, (2) the application
of the EM algorithm (Expectation Maximization) for the Beta distribution and
(3) the estimation of the components’ optimal number.
3.4.1 Initiation to the Model of the Beta Distribution
In the theory of probabilities and in statistics, the beta law is a family of laws
of continuous probabilities, defined on the interval [0, 1], parametrized by two
shape parameters, typically denoted αand β. In our approach, the AST (u) scores
estimated in the previous phase are between 0 and 1. This score is distributed
according to the Beta law AST Be(α;β), hence its probability density is as
follows: (see Eq. (10)):
B(AST )= 1
Be(α;β)AST α1(1 AST )β1(10)
Knowing that the function Beta Be(α;β) and the Γfunction are defined by
respectively the Eqs. (11) and (12).
Be(α;β)=Γ(α)Γ(β)
Γ(α+β)(11)
Γ:z+
0
tz1exptdt (12)
3.4.2 Estimation of the Parameters of a Component
To estimate the αand βparameters of a component of the Beta distribution,
it is essential to calculate the empirical average x(see Eq. (13)) of our sample
(N: the number of nodes in the graph) and the variance v(see Eq. (14)) of a
component.
x=1
N
N
i=1
ASTi(13)
v=1
N
N
i=1
(ASTix)2(14)
The estimates αand βare calculated respectively by the Eqs. (15) and (16).
α=xx(1 x)
v1(15)
β=(1x)x(1 x)
v1(16)
Detection of Users’ Abnormal Behavior on Social Networks 625
3.4.3 Application of the EM Algorithm for Beta Distribution
As its name indicates, the ML approach consists in maximizing the likelihood,
i.e. maximizing L(Θ, AST )=
N
i=1
k
k=1
βkBk(ASTi;αk)or equivalently maximizing
the log likelihood l(Θ, AST )=
N
i=1
log k
k=1
βkBk(ASTi;αk)in order to estimate
the unknown parameters, with Θ=(α1
1, ......, αk
k)the unknown param-
eters of the parametric model. However, this maximization problem cannot be
solved analytically due to the hidden data. We must find solutions using iterative
algorithms. Among these algorithms is the EM algorithm [29].
This algorithm aims at providing an estimator when it is impossible to cal-
culate the solution because of the presence of hidden or missing data or rather,
when the knowledge of these data would make it possible to estimate the param-
eters. The EM algorithm takes its name from the fact that at each iteration it
operates two distinct steps:
(i) the “Expectation” phase, often referred to as “step E ”, proceeds to the
estimation of the unknown data, taking into account the observed data and
the value of the parameters determined at the previous iteration;
(ii) the “Maximization ” phase, or “Mstage, thus proceeds to the maximiza-
tion of the likelihood, made possible now by using the estimation of the
unknown data carried out in the previous step, and updates the value of the
parameter(s) for the next iteration.
The algorithm ensures that the likelihood increases with each iteration, which
leads to more and more accurate estimators.
3.4.4 Estimation of the Number of Components (p)oftheBeta
Distribution
To find the right model for our data, we must estimate the number of components
pand the αand βparameters of each component. The number of components
pvaries between 1 and pmax. For each calculated component, performance
metrics are determined to identify the optimal number of components. To do
this, the Bayesian Information Criterion (BIC) criterion was used [30]. The
BIC criterion is written as follows: (see Eq. (17))
BIC(p)=2 log(Lp)+kplog(N)(17)
with L: the likelihood of the estimated model, N: the number of observations in
the sample and k: the total number of estimated model parameters.
626 N. E. H. Ben Chaabene et al.
4 Experimentation and Evaluation
This section presents an empirical assessment of the performance of our app-
roach on a three-dimensional network. The different dimensions of the tested
network are as follows : (1) Facebook, (2) Twitter and (3) Instagram. We have
no prior knowledge of node partitioning, i.e., prior knowledge of whether nodes
belong to a specific category (anomaly node or normal node), is missing. Because
of this, we cannot use supervised metrics that rely on the existence of a refer-
ence partitioning. In this context, we have adopted an objective approach that
consists in interpreting node deviations by manual investigation and graphical
visualization of the adjacency matrix of the network. For each node of the net-
work, we estimated a total anomaly score. We then modeled the distribution of
these scores according to our probabilistic model, which exploits the distribution
of the mixing law.
This is represented on the density curve of the anomaly scores of the studied
network (see Fig. 2). The density curve allowed us to note the great flexibility
and adaptability of the Beta mixing model to model the distributions. In Fig. 2,
the first component (where the values are close to zero) represents the values of
the lowest anomaly scores. As a result, the nodes associated with the scores that
are grouped in this component are identified as anomalies.
For the tested network, approximately 10Kof nodes in a set of 397Kwere
selected as nodes with atypical connections. Figure 3presents the adjacency
matrix of the three dimensions of the network so that the nodes are sorted
in an ascending order with respect to their anomaly scores. With the weakest
AST (u) scores, the anomalies are placed at the top of the matrix. Consequently,
these anomalies are connected in a sparse way on the network, whereas the nor-
mal nodes are closely connected and are manifested on the matrix by the dense
regions.
Fig. 2. Probability density of the estimated scores
Detection of Users’ Abnormal Behavior on Social Networks 627
Fig. 3. Adjacency matrix of the studied network
5 Conclusion
In this article, we have studied several methods and approaches for detecting
anomalies in OSNs. Reviewed works suffer from the lack of synchronization of
user accounts. Considering OSN as a multidimensional graph in our new app-
roach, these networks are analyzed based on the relationships between the nodes.
Defined graph metrics are calculated to estimate the anomaly score of each node
and a classification with the Beta distribution algorithm is established for the
detection of atypical nodes. To conclude, following the quality of the results
obtained, we believe that this work presents an effective means that can be
applied in different practical contexts. In our future research, we will explore
different ways to extend this work. One of the possibilities to consider is the
upstream analysis of a node’s behavior, and the prediction of the influence of
the nodes with abnormal behavior on the rest of the network users.
References
1. Anand, K., Kumar, J., Anand, K.: Anomaly detection in online social network: a
survey. In: 2017 International Conference on Inventive Communication and Com-
putational Technologies (ICICCT), pp. 456–459 (2017)
2. Grubbs, F.E.: Procedures for detecting outlying observations in samples. Techno-
metrics 11, 1–21 (1969)
3. John, G.H.: Robust decision trees: removing outliers from databases. In: Proceed-
ings of KDD, pp. 174–179 (1995)
4. Aggarwal, C.C., Yu, P.S. Outlier detection for high dimensional data. ACM SIG-
MOD Rec. (2002). https://doi.org/10.1145/376284.375668
5. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Com-
put. Surv. 41, 1–72 (2009)
6. Savage, D., Zhanga, X., Yua, X., Chouab, P., Wanga, Q.: Anomaly detection in
online social networks. Soc. Netw. 39, 62–70 (2014)
7. Hawkins, D.M.: Identification of Outliers, vol. 11. Springer, Dordrecht (1980)
8. Kaur, R., Singh, S.: A survey of data mining and social network analysis based
anomaly detection techniques. Egypt. Inf. J. 17, 199–216 (2016)
628 N. E. H. Ben Chaabene et al.
9. Vanetti, M., Binaghi, E., Carminati, B., Carullo, M., Ferrari, E.: Content-based
filtering in on-line social networks. In: Dimitrakakis, C., Gkoulalas-Divanis, A.,
Mitrokotsa, A., Verykios, V.S., Saygin, Y. (eds.) Privacy and Security Issues in
Data Mining and ML, vol. 6549, pp. 127–140. Springer, Heidelberg (2011)
10. Holland, P.W., Leinhardt, S.: The structural implications of measurement error in
sociometry. J. Math. Sociol. 3(1), 85–111 (1973)
11. Viswanath, B., Bashir M.A, Crovella, M., Guha, S., Gummadi, K.P., Krishna-
murthy, B., Mislove, A.: Towards detecting anomalous user behavior in online social
networks. In: Proceedings of the 23rd USENIX Security Symposium (USENIX
Security) (2014)
12. Xiao, C., Freeman, D.M., Hwa, T.: Detecting clusters of fake accounts in online
social networks. In: Proceedings of the Eighth ACM Workshop on Artificial Intel-
ligence and Security, pp. 91–101 (2015)
13. Getoor, L., Dieh, C.P.: Link mining - a survey. ACM SIGKDD Explor. Newslett.
7, 3–12 (2005)
14. Fire, M., Katz, G., Elovici, Y.: Strangers intrusion detection - detecting spammers
and fake profiles in social networks based on topology anomalies. ASE Hum. J.
1(1), 26–39 (2012)
15. Zheleva, E., Getoor, L., Golbeck, J., Kuter, U.: Using friendship ties and fam-
ily circles for link prediction. In: Giles, L., Smith, M., Yen, J., Zhang, H. (eds.)
Advances in Social Network Mining and Analysis, vol. 5498, pp. 97–113. Springer,
Heidelberg (2008)
16. Akoglu, L., McGlohon, M., Faloutsos, C.: OddBall: spotting anomalies in weighted
graphs. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, vol.
13, pp. 410–421 (2010)
17. Hassanzadeh, R., Nayak, R., Stebila, D.: Analyzing the effectiveness of graph met-
rics for anomaly detection in online social networks. In: Proceedings of the 13th
International Conference on Web Information Systems Engineering (2012)
18. Rezaei, A., Kasirun, Z.M., Rohani, V.A., Khodadadi, T.: Anomaly detection in
online social networks using structure based technique. In: Eighth International
Conference on Internet Technology and Secured Transactions (ICITST), pp. 619–
622 (2013)
19. Chouchane, A., Bouguessa, M.: Identifying anomalous nodes in multidimensional
networks. In: International Conference on Data Science and Advanced Analytics
(2017)
20. Kruegel, C., Mutz, D., Robertson, W., Valeur, F.: Bayesian event classification
for intrusion detection. In: Proceedings of the 19th Annual Computer Security
Applications Conference, pp. 14–23 (2003)
21. Boccaletti, S., Bianconi, G., Criado, R., Del Genio, C., G´omez-Garde˜nes, J.,
Romance, M., Sendi˜na-Nadal, I., Wang, Z., Zanin, M.: The structure and dynamics
of multilayer networks. Phys. Rep. 544, 1–22 (2014)
22. Yang, Y., Guo Y.C., Ma. Y.N.: Characterization of communities in online social
network. In: Proceedings of 2010 Cross-Strait Conference on Information Science
and Technology, pp. 600–605 (2010)
23. Sawadogo, I., Odongo, L., Ly, I.: Maximum likelihood estimation of the parameters
of exponentiated generalized weibull based on progressive type ii censored data.
Open J. Stat. 7(6), 956–963 (2018)
24. Djrobie, D.: Mod`ele de m`elange et classification. Open J. Stat (2016)
25. Couton, F., Danech, M., Broniatousk, M.: Application des m´elanges de lois de
probabilit´e`a la reconnaissance de regime trafic routier. RTS-Recherche n53, 49–57
(1996)
Detection of Users’ Abnormal Behavior on Social Networks 629
26. Chen, Z., Hendrix, W., Samatova, N.F.: Community-based anomaly detection in
evolutionary networks. J. Intell. Inf. Syst. 39, 59–85 (2012)
27. Ma, Z., Leijon, A.: Bayesian estimation of beta mixture models with variational
inference. IEEE Trans. Pattern Anal. Mach. Intell. 33, 2160–2173 (2011)
28. Boutemedjet, S., Ziou, D., Bouguila, N.: Model-based subspace clustering of non-
gaussian data. Neurocomputing 73, 1730–1739 (2010)
29. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete
data via the EM algorithm. J. Roy. Stat. Soc. 39, 1–38 (1977)
30. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
... This can be done either statically or dynamically. In the static graph-based detection methods [12,[32][33][34][35][36][37], the analysis is done on a single snapshot of the network, while for the dynamic graph-based detection methods [38][39][40][41], the analysis is done in a time-based way by analyzing a series of snapshots. ...
... The detection of anomalies in multidimensional information networks is an area of current research. Although the works presented previously studied well the detection of anomalies in one-dimensional networks, only two recent works have tackled this subject in multidimensional networks [12,40]. This particular type of network is characterized by the synchronization of the studied networks. ...
... The authors in [12] indicated that an anomaly is a node with sparse connections, which do not belong to any dense region in all the dimensions of a multidimensional network. Contrary to them, Ben Chaabene et al. [40] denoted that a node can be separated from other nodes in space because it could be new to the network, which means that it did not have many links and might not be present in all dimensions of the network. This work calculated different graph metrics such as the calculation of user communities, the degree of membership of each node in its community, etc. to estimate the anomaly score for each node. ...
Article
Full-text available
Anomaly in Online Social Network can be designated as an unusual or illegal activity of an individual. It can also be considered as an outlier or a surprising truth. Due to the emergence of social networking sites such as Facebook, Instagram, etc., the number of negative impacts of aggressive and bullying phenomena has increased exponentially. Anomaly detection is a problem of crucial importance which has attracted researchers since the 2000s. This problem is often carried out, thanks to deep learning, artificial intelligence and statistics. Several methods have been devoted to solving the problem of detecting abnormal behavior on social media, which are kept under three different types: structural methods which are based on the analysis of graphs of social networks, behavioral methods which are based on the extraction and analysis of user activities and hybrid methods which combine the two types of methods mentioned above. This survey reviews various methods of data mining for the detection of anomalies to provide a better assessment that can facilitate the understanding of this area.
Article
Full-text available
With the increasing trend of online social networks in different domains, social network analysis has recently become the center of research. Online Social Networks (OSNs) have fetched the interest of researchers for their analysis of usage as well as detection of abnormal activities. Anomalous activities in social networks represent unusual and illegal activities exhibiting different behaviors than others present in the same structure. This paper discusses different types of anomalies and their novel categorization based on various characteristics. A review of number of techniques for preventing and detecting anomalies along with underlying assumptions and reasons for the presence of such anomalies is covered in this paper. The paper presents a review of number of data mining approaches used to detect anomalies. A special reference is made to the analysis of social network centric anomaly detection techniques which are broadly classified as behavior based, structure based and spectral based. Each one of this classification further incorporates number of techniques which are discussed in the paper. The paper has been concluded with different future directions and areas of research that could be addressed and worked upon.
Conference Paper
Fake accounts are a preferred means for malicious users of online social networks to send spam, commit fraud, or otherwise abuse the system. A single malicious actor may create dozens to thousands of fake accounts in order to scale their operation to reach the maximum number of legitimate members. Detecting and taking action on these accounts as quickly as possible is imperative in order to protect legitimate members and maintain the trustworthiness of the network. However, any individual fake account may appear to be legitimate on first inspection, for example by having a real-sounding name or a believable profile. In this work we describe a scalable approach to finding groups of fake accounts registered by the same actor. The main technique is a supervised machine learning pipeline for classifying {\em an entire cluster} of accounts as malicious or legitimate. The key features used in the model are statistics on fields of user-generated text such as name, email address, company or university; these include both frequencies of patterns {\em within} the cluster (e.g., do all of the emails share a common letter/digit pattern) and comparison of text frequencies across the entire user base (e.g., are all of the names rare?). We apply our framework to analyze account data on LinkedIn grouped by registration IP address and registration date. Our model achieved AUC 0.98 on a held-out test set and AUC 0.95 on out-of-sample testing data. The model has been productionalized and has identified more than 250,000 fake accounts since deployment.
Article
In the past years, network theory has successfully characterized the interaction among the constituents of a variety of complex systems, ranging from biological to technological, and social systems. However, up until recently, attention was almost exclusively given to networks in which all components were treated on equivalent footing, while neglecting all the extra information about the temporal- or context-related properties of the interactions under study. Only in the last years, taking advantage of the enhanced resolution in real data sets, network scientists have directed their interest to the multiplex character of real-world systems, and explicitly considered the time-varying and multilayer nature of networks. We offer here a comprehensive review on both structural and dynamical organization of graphs made of diverse relationships (layers) between its constituents, and cover several relevant issues, from a full redefinition of the basic structural measures, to understanding how the multilayer nature of the network affects processes and dynamics.