Conference PaperPDF Available

User Behavior Analysis with Machine Learning Techniques in Cloud Computing Architectures

Authors:

Abstract and Figures

This paper presents the use of machine learning algorithms to analyze the behaviors of users working in a distributed computer environment. The objective consists in discriminating groups of close users. These groups are composed of users with similar behaviors. Event related to the user's behaviors are recorded and transferred to a database. An approach is developed to determine the groups of the users. A non-parametric method of estimating a probability density is used to predict application launches and session openings in an individual way for each user. These algorithms have been implemented and demonstrated their effectiveness within a complete virtualization environment for workstations and applications under real conditions in a hospital.
Content may be subject to copyright.
User Behavior Analysis with Machine Learning
Techniques in Cloud Computing Architectures
Matias Callara , Patrice Wira
Universit´
e de Haute Alsace, IRIMAS Laboratory, 68093 Mulhouse, France
Email: {matias-ezequiel.callara, patrice.wira}@uha.fr
Abstract—This paper presents the use of machine learning
algorithms to analyze the behaviors of users working in a dis-
tributed computer environment. The objective consists in discrim-
inating groups of close users. These groups are composed of users
with similar behaviors. Event related to the user’s behaviors are
recorded and transferred to a database. An approach is developed
to determine the groups of the users. A non-parametric method
of estimating a probability density is used to predict application
launches and session openings in an individual way for each user.
These algorithms have been implemented and demonstrated their
effectiveness within a complete virtualization environment for
workstations and applications under real conditions in a hospital.
Index Terms—Machine learning, user behavior analytics, be-
havior analysis, user classification, prediction
I. INT ROD UC TI ON
Working software environments, in the broadest sense of
the term, can no longer do without remote capabilities. More
recently, it has been see that mobility and productivity are
linked, since the first improves the second. Indeed, in a form
of ultimate culmination of mobility, the user’s workstation
can be a notebook computer but also smartphone or any
other connected object like a tablet, a connected television,
a specific terminal, or even a connect object from the Internet
of Things (IoT). In these terminals, not only the data have to
be loaded from and sent to the server but also the applications
and even complete working environments. Thus, Information
and Communication Technology (ICT) architectures must be
designed with the possibility to render the applications on the
terminal and without installation. This is the new generation
of a cloud computer architecture that involves deploying vir-
tualization software that combines workstation virtualization
and application virtualization. These virtualization solutions
provides mobility or even a form of ultra-mobility that is now
necessary and indispensable to the users to be more effective
in achieving tasks.
In virtual desktops, launching an application can take 15
to 20 seconds while loading a web page from the Internet
takes less than a second. The requirements of the users are
natural, the ICT architecture must evolve to satisfy them and
to tend towards real time. Therefore, the challenge consists
in the analysis of behaviors and the classification of users to
predict their future activities.
By predicting the users activities, it will be possible to
anticipate the opening of a session or the launch of an
application and thus, the data and other resources that are
necessary can be made available in advance, i.e., sessions and
applications can be pre-loaded. Predicting the users activities
is also a way to optimize the resources on the server side [1].
The prediction algorithms can be based on recent advances
in Machine Learning (ML) theories [2]. They are appropriate
to handle the big amount of information collected from the
user and to digest the heterogeneity of the generated data.
Obviously, ML techniques are able to evaluate their own ef-
fectiveness. The learning capabilities of these techniques allow
them to constantly adapt to changes in users behavior and to
achieve virtualization of workstations that evolve according to
the needs.
The rest of the paper is organized as follows. The next
section describes User Behavior Analytic (UBA) issues. Sec-
tion III reviews ML approaches for UBA applications. Specific
learning techniques are presented in Section IV and implemen-
tation aspects and results are provided in Section V. Finally,
conclusions are drawn in Section VI.
II. BE HAVIOR ANA LYSIS AND USE R DET EC TI ON
Since several years, behavioral analysis has been the focus
of intense efforts in marketing applications [3]. Obviously, the
objective is to adopt some new specific and efficient marketing
strategies that are based on data, i.e., recorded information that
represent the past activities of potential clients. This is referred
to as data-based behavioral marketing. Behavioral analysis has
also found its usefulness in the fight against fraud and in
various other applications [4]. Now, it is not a surprise to
see that behavioral analysis can enhance ICT, organize more
efficiently production tools, detect internal threats like targeted
attacks, adapt softwares to the users, accelerate some repetitive
tasks, etc. However, it goes with a certain acceptability from
the users [5].
A. Definition and Objective
A user model is a representation of a user or a group of
users in an ICT system [6], [7]. This model includes a set
of parameters and/or data that are representative of the user’s
past behavior.
The development of user models starts with the design
of systems able to collect all the data that are necessary
to represent the users. The data can be used to get a deep
understanding of the users [8]. In some contexts, the data
related to a single user are huge and it is necessary to
define reliable models of users or groups of users. These
M. Callara and P. Wira, "User Behavior Analysis with Machine Learning Techniques in Cloud Computing Architectures,"
International Conference on Applied Smart Systems (ICASS 2018), Médéa, Algeria, 24-25 November, 2018.
DOI: 10.1109/ICASS.2018.8651961
models are made of features and parameters that represent
the users or groups of users in their activities performed
through various applications. These models can then be used
as a basis for providing personalized services to them. Indeed,
missing information can be retrieved, specific categories can
be deduced, future activities and behaviors can be predicted
and all this helps to an interactive, adapted and personalized
interaction with the users. Applications are various, in natural
dialogue processing, in speech transcription systems, in new
tools for business strategy and marketing, in human resource
management, to detect security anomalies, etc. At the very
end, the main goal is always to improve the user experience.
B. Behavior Analysis
The UBA is the discipline of analyzing user behaviors. In an
operational way, it is essentially the collecting, monitoring and
processing of user data. The data sets collected from the users
are stored in databases, data log files, histories, directories, and
furthermore any other systems recording the user behaviors.
The purpose of this process is to provide parameters and to
build reliable and usable models of users, in other words, that
accurately characterize the users.
For example, the Internet has become a privileged space
for this type of application [9]. Indeed, technologies are now
mature, ready and spread out in order to collect and exploit
the present and past behavior of individual Internet users in
real time. The interests, the attendances, the facts and gestures,
the movements, the attitudes, the lifestyle, the living standard,
etc. are deduced from data sets being produced by surfing on
the Internet. Obviously, the status of a user can evolve and
change at any time. Techniques make it possible to adapt the
models on the basis of the experiment and according to the
evolutions of the collected data in real time.
The UBA relies on three pillars: Data analysis, data in-
tegration and data presentation. Actually, the analysis and
processing the phenomenal amount of data is the most dif-
ficult challenge. The heterogeneity, volume and speed of data
generation are increasing rapidly. This is exacerbated with
the use of wireless networks, IoT sensors, smartphones and
the increasing activities on the Internet. Therefore, real time
UBA must be fast in processing the big amount of data and
ML algorithms should be appropriate candidates [10]. For that
purpose, ML algorithms must be run in real time, access to
the whole data sets, adapt their own parameters, i.e., learn.
ML algorithms can also be interfaced with enterprise resource
planning softwares to get additional information about the
users and to combine them with their past and present activities
while processing. The idea is to enable the establishment of
self-adaptive models.
III. MACH IN E LEARNING ALGORITHMS FOR UBA
The techniques of ML represent a branch of statistics and
computer science and studies the algorithms and architectures
capable of learning from the observed facts, i.e., measured
data [2], [11]–[13]. These techniques include artificial neural
networks with supervised learning, Bayesian decision the-
ory, parametric, semi-parametric and non-parametric methods,
multivariate analysis, hidden Markov models, reinforcement
learning, kernel estimators, graphical models, statistical tests...
ML methods through a learning process are able to self-
adjust their own parameters from a data set. This set of data
contains all the coherent information that are necessary for
example to bring out a classification, modeling or prediction
task. Furthermore, it is often necessary to separate all the
data available in two sub-sets: 1) The learning set which is
used to learn or to calculate the optimal parameters of the
learning machine; 2) The test set which is used to verify the
performance of the machine after learning from the previous
set.
The quickly growing amount of data collected via the
Internet and IoT has promoted the developments of ML
techniques [14]. Many companies have already their own data
harvesting tools and now they are faced with the challenge of
exploiting them in an effective and relevant way.
The user model that must be used varies according to the
applications and the objectives [15]. User models may seek to
describe:
1) The cognitive processes underlying the user’s actions;
2) A difference between user’s skills and expert skills;
3) Behavioral patterns or user preferences;
4) The characteristics of the user.
The first applications of ML techniques for UBA were
centered on the first two types of models. Most recently,
research activities focus on developing the third type of model
and try to find out user preferences. Finally, the applications
of ML techniques aimed at discovering the characteristics of
the users - i.e., related to the fourth type of the previous list -
remain scarce. Today it is the scientific issue that is the most
interesting to explore and that attracts the most attention. In the
design of user models, it is important to distinguish between
approaches to model individual users or communities, classes,
groups of users.
The major limitations in implementing and using automatic
ML techniques for UAB purposes are the followings:
The amount of data that sometimes requires very large
computing capabilities. Indeed, in most situations, ML
algorithms require a relatively large number of examples
to be precise.
The validity of the data included in the learning set and
which is necessary for a ML algorithm to build user
models with an acceptable accuracy. In other words, how
can we be sure that a data set corresponds exactly to a
type of user, to one of its behaviors, to atypical and/or ab-
normal behaviors, to changes in the behavior of users...?
A simplistic strategy is to use a large amount of data to
compensate for uncertainties, exceptions, deviations, etc.
IV. UBA IN VIRTUA L DES K ENVIRONMENTS
A. Workstation Virtualization Context
Virtualization of workstations is a logical evolution of
digital transformation. It allows employees to work with fewer
Database Server
Terminal
Application Server Application Server
File Server
Virtual PC
Directory Server
PC
PDA Tablet
RDP Channel
Management Server
Load Balancer
Fig. 1. ICT general architecture for virtualized applications.
constraints and at the same time it reduces the costs for the
ICT administrator. Indeed, this type of infrastructure has the
advantage of significantly reducing the maintenance and client-
side management tasks while providing the user employee
with its full working environment (settings, files, software,
etc.) and this whatever the material support. Such a cloud
computer architecture is represented by Fig. 1.
The benefits workstation virtualization are multiple:
It increases the employee productivity and mobility
through a single solution and without compromising the
security.
It delivers remote access while respecting confidentiality,
compliance, and risk management standards.
It reduces computer-related costs and complexity by cen-
tralizing application and workstation management, and
other automating common tasks on the server side.
It freee up the ICT resources by simplifying the manage-
ment of applications, workstations and data.
The bulk of the configuration and calculation tasks is
then fully focused on the server side. Companies such as
Citrix1and Systancia2offer a comprehensive and complete
software solution for virtualized desktop and applications. In
this king of cloud computer architecture, several servers are
used and they must be organized and designed to provide users
with real time access to data and applications. A strategy to
allocate the computing loads between servers is required and
its implementation is achieved by the management server in
Fig. 1. The efficiency of this strategy is improved by the
prediction of opening/closing sessions and of launching of
remote applications for each user. By predicting the behavior
of users, it becomes possible to improve their experience [16].
1www.citrix.com
2www.systancia.com
B. Proposed Algorithm for User Classification
Here, the objective is to classify each individual user ac-
cording only to his previous behaviors. To do this, the instants
when a remote application has been launched by a user are
recorded. This allows to build a histogram from the instants of
application launches for each user. Then, a dissimilarity matrix
is calculated using the Jensen-Shannon divergence [17]. This
dissimilarity matrix is then used to project each dot, i.e., user,
that represents a histogram in a multidimensional space in
a new two-dimensional space by ensuring that the distances
between the dots (the dissimilarities) are preserved [18].
Finally, the dots in the new space with reduced dimensions
can be grouped to set up classes or categories of users. The
K-means algorithm [19] is used to determine similar groups
of users.
C. Proposed Algorithm for the Prediction of Application
Launches
Now, the objective is to predict the instant when a user
will launch a remote application. The proposed prediction
algorithm is only based on user-past behaviors. The instants
when a user has launched a remote application has been
recorded and are available at ay time.
As an example, the time interval in which the user will
start the first application in a day is predicted. A time
granularity is used to estimate discrete probability distribution
P(H|W D, U )where H= 0, ..., 23 is the time, W D =
M o, T u, W e, T h, F r, S a, Su is the day of the week and U
represents a user. For each user, the opening of the day’s
applications is predicted by calculating the interval or po-
tentially multiple intervals in which the event is supposed
to occur. This is achieved with a very fine granularity by
estimating the probability distribution of launches over time by
a Kernel Density Estimator (KDE) estimator [20]. It is a kernel
Fig. 2. View of an application launch with the virtualization software AppliDis
Fusion from Systancia.
estimation technique (the Parzen-Rosenblatt method) which is
a non-parametric method of estimating the probability density
of a random variable. The estimator is formed by the average
of the Gaussian curves called kernels. To take into account
the data periodicity, a circular distribution function has been
chosen as the basic function defining a kernel.
The periodicity is defined for a user by the average time
elapsed between two application launches. It can be very
different from one user to another one. We use an envelope
of several Gaussians to obtain an estimate of the probability
density.
1) Model Selection: The behavior of the users will, in
the general case, be a composition of motifs with different
periodicities. The challenge consists in finding a period Tthat
will generate a probability distribution with a low differential
entropy. For each period T, one or more bandwidths of the
kernels can be applied and we have chosen the cross-validation
technique to select the bandwidth [20].
2) Optimization: The period Tis determined by the res-
olution of an optimization problem based on a cost function
J. This allows you to adjust a compromise between taking
into account the size of the interval in order to increase the
likelihood of covering an application launch and computational
resources (CPU and server RAM). Of course, the more the
interval is high, the more prediction accuracy is good and the
required computational costs bulky. In the end, it is the system
administrator that settles this compromise. In practical terms,
for users this amounts on one side by preloaded applications
kept idle but fast to be launched and on the other side by larger
delays for launching applications.
V. IMPLEMENTATION AND RE SU LTS
A. Pratical Aspects
The algorithms have been implemented within AppliDis,
a software for virtualization of workstations and applications
hours of the week
0 24 48 72 96 120 148
quantity of hourly logons
0
2000
4000
6000
8000
Fig. 3. Number of logons per hour in a French university hospital over a
period of a week.
Fig. 4. Two dimensional projection of the users (1 dot for 1 user) with 6
clusters separated with the K-means algorithm for user classification.
in a single management console. Clients of such a software
are companies, large accounts, hospitals, or other big orga-
nizations with a large amount of users, with users showing
different profiles (office staff, doctors, technicians, nurses,
etc.), with users requiring different needs and where some
users must access to remote data and application up to 24/24h.
The ICT cloud architecture is the one of Fig. 1 and Fig. 2
shows the remote application launch through AppliDis Fusion
5 by a user on its workstation.It is the virtualization software
that allows to access to applications hosted on the servers. This
software also stores the instants when an applications has been
asked for, when it is available, when it is closed, etc.
Tests have been achieved on a real cloud computer systems
with the virtualization of workstations and applications. In
practical terms, data have been collected from a French
University Hospital which includes around 800 users accessing
110 applications distributed over 35 servers 24/24h. The
resulting data set represents a period of approximately 12
months. The activities and the user behavior can be seen by
the histogram in Fig. 3 which shows the number of logons
during a full week with a resolution of 1h. In this particular
example, a user opens 5 applications per day in average. This
number can vary from 1 to 82 applications per day depending
on the user. The most frequent periodicities detected in all the
users are 24, 12, and 8h. This means that some applications
are launched every 24, 12 or 8 hours.
B. User Classification
The proposed user classification algorithm is evaluated on
the data set and context previously described.
The algorithm allows to distinguish 108 groups of users.
The user have been projected in a high-dimensional space.
In this case, a 108-dimensional space has been used. A user
is represented by a dot where each coordinate represents the
utilization periodicity of a certain application. A dimension
reduction algorithm is used convert the dots from the high
into a lower dimensional representation. This algorithm must
preserve the distances because 2 dots which are close in the
108-dimensional space represents 2 similar users. The results
of converting the 108-dimensional space into a plane (2D) is
presented by Fig. 4. The 2 dimensions have not units and are
not related to any physical parameters. In this figure, there
are 765 dots where each corresponds to a user. The simple K-
means algorithm has been used to group the dots into clusters.
We chose k=6 cluster and the color of the dots represents its
cluster.
We compared the results obtained with the dimension re-
duction algorithm based on the dissimilarity matrix calculated
with the Jensen-Shannon divergence to other dimension reduc-
tion techniques like Principal Component Analysis (PCA) or
Multidimensional Scaling (MDS). We noticed that the use of
the dissimilarity matrix generates the best results.
C. Prediction of Application Launches
The proposed prediction algorithm is evaluated on the same
data set in order to estimate the future instant of remote
application launches by users.
What characterizes a user is the period T. The choice of
the interval to predict the launch of an application is achieved
by knowing that this interval must cover a probability of mass
greater than a certain threshold. For each user, the number of
predicted application launches is represented by a single num-
ber, the Area Under the Curve (AUC). This AUC varies from
0 to 1 and is directly related to the size of the interval taken
into account for the prediction. This is shown in Fig. 5. On this
figure, the yellow bars represent the probability distribution of
application launches by hour for a weekday (Friday) for the
user with id=3,P(H|W D =F riday, U = 3). The red curve
is the complementary cumulative probability mass function,
this is the probability P(H > h|W D =F r iday, U = 3).
Obviously, the probability decreases when the time moves
forward to the end of the day. Finally the blue bars allow to
highlight the part of the probability distribution that is covered
by the predicted interval from Hour = 10 to H our = 21, this
is P(10 H21|W D =F riday, U = 3).
The entropy of the behavior will allow to estimate the
upper bound of the interval, and the algorithm will make the
Hour
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Cumulative Probability
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Logins (AppliDis Session) Start Dates Histogram
UserId: 3; Weekday: Friday
Interval
Fig. 5. Example of the cumulative probability and probability distribution for
the launch of an application during a user session.
Interval Duration as % of the total period T
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Cumulative Probability
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
AUC: 0.65745
AUC: 0.76906
Cumulative Probability vs Interval Duration
Fig. 6. Relationship between cumulative probability and the duration of the
interval, here with examples for 2 different values of the AUC.
predictions by trying to reach this limit. The exact value of the
interval’s upper bound cannot be known in advance, so it is not
possible to know whether a prediction is optimal or not. We
seek to improve the prediction performance at each iteration of
the algorithm by using an AUC as close as possible to 1. For
example, Fig. 6 shows the cumulative probability according to
a part (in percent) of the interval with two different values of
the AUC. Finally, in this test 97% of users have predicted
application launches which means that the application will
then be preloaded on their terminal.
By loading the applications in advance with the prediction
algorithm, the user can see a reduced time delay. For the
administrator of the cloud computer architecture, the issue
consists in finding an acceptable compromise between pre-
dictive performance, i.e., accelerated applications, and server
resources, i.e., CPU, RAM, and power consumption. Our
behavior analysis tools are integrated in the virtualization
software and are available to the administrator who can view
the system performances and additional indicators calculated
and predicted by the ML techniques.
We presented an implementation of ML algorithms for user
behavior analyze purposes in a cloud computing environment
that combines workstation virtualization and application virtu-
alization. Our user behavior analysis consists in the classifica-
tion of users and in the prediction of some of their activities
such as application launches. The concept of UBA (User
Behavior Analytic) has been used in this specific context.
VI. CONCLUDING REMARKS
Dissimilarity measures and data clustering methods (K-
means) have enabled the identification of groups of similar
or closely related users. Then, the time interval in which a
user will launch an application has been predicted by using
a non-parametric estimating method of a probability density.
It consists in a Kernel Density Estimator (KDE) estimation
technique.
These algorithms have been implemented within a work-
station and application virtualization software that is able
to track and visualize users’ activity and behavior in real
time. Thanks to the algorithms previously mentioned, the
virtualization software is thus also able to predict in real time
the openings of sessions and applications of users regardless
of the periodicity of the past information. This has been
verified in a working environment and under real operation
conditions. A performance analysis shows that the machine
learning techniques are effective in clustering the users and in
predicting their behaviors.
The proposed solution aims to ensure a fast remote access
to the applications for the user while reducing the maintenance
costs for the ICT architecture administrator.
ACKNOWLEDGMENT
The authors would like to thank the Systancia company for
supporting this work and providing anonymized data from a
cloud computer system.
REFERENCES
[1] G. Warkozek, V. Debusschere, and S. Bacha, “Automated parameters
retrieval for energetic model identification of servers in datacenters,” in
IEEE PowerTech, 2013, Conference Proceedings.
[2] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
Learning: Data mining, inference and prediction, ser. Springer Series in
Statistics. Springer-Verlag, 2013.
[3] G. R. Foxall, “Behavior analysis and consumer psychology,” Journal of
Economic Psychology, vol. 15, no. 1, pp. 5 – 91, 1994.
[4] A. R. Baig and H. Jabeen, “Big data analytics for behavior monitoring
of students,” Procedia Computer Science, vol. 82, pp. 43–48, 2016.
[5] J. Barcenilla and J.-M.-C. Bastien, “Acceptability of innovative technolo-
gies: Relationship between ergonomics, usability, and user experience,”
Le travail humain, vol. 72, no. 4, pp. 311–331, 2009.
[6] A. Kobsa, “User modeling: Recent work, prospects and hazards,Human
Factors in Information Technology, vol. 10, pp. 111–111, 1993.
[7] ——, “Generic user modeling systems,” User Modeling and User-
Adapted Interaction, vol. 11, no. 1, pp. 49–63, 2001.
[8] O. Bent, P. Dey, K. Weldemariam, and M. K. Mohania, “Modeling user
behavior data in systems of engagement,” Future Generation Computer
Systems, vol. 68, pp. 456–464, 2017.
[9] M. Pazzani and D. Billsus, “Learning and revising user profiles: The
identification of interesting web sites,” Machine Learning, vol. 27, no. 3,
pp. 313–331, 1997.
[10] R. F. Molanes, K. Amarasinghe, J. J. Rodriduez-Andina, and M. Manic,
“Deep learning and reconfigurable platforms in the internet of things,”
IEEE Industrial Electronics Magazine, vol. 12, no. 2, pp. 36–49, 2018.
[11] C. Bishop, Pattern Recognition and Machine Learning. Springer, 2007.
[12] K. P. Murphy, Machine Learning: A Probabilistic Perspective. The
MIT Press, 2012.
[13] E. Alpaydin, Machine Learning: The New AI. The MIT Press, 2016.
[14] I. Szilagyi and P. Wira, “An intelligent system for smart buildings using
machine learning and semantic technologies: A hybrid data-knowledge
approach,” in 1st IEEE International Conference on Industrial Cyber-
Physical Systems (ICPS 2018), 2018, pp. 20–25.
[15] G. I. Webb, M. J. Pazzani, and D. Billsus, “Machine learning for user
modeling,” User modeling and user-adapted interaction, vol. 11, no.
1-2, pp. 19–29, 2001.
[16] M. Callara and P. Wira, “Machine learning pour l’analyse de com-
portements et la classification d’utilisateurs,” in Congr`
es National de
la Recherche des IUT (CNRIUT’2017), 2017.
[17] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles
and Techniques - Adaptive Computation and Machine Learning. The
MIT Press, 2009.
[18] T. F. Cox and M. A. A. Cox, Multidimensional scaling, 2nd ed., ser.
Monographs on statistics and applied probability. Boca Raton, Fla.:
Chapman & Hall/CRC, 2001, no. 88.
[19] R. Xu and D. C. Wunsch, “Survey of clustering algorithms,IEEE
Transactions on Neural Networks, vol. 16, no. 3, pp. 645–678, 2005.
[20] C. O. Wu, “A Cross-Validation Bandwidth Choice for Kernel Density
Estimates with Selection Biased Data,” Journal of Multivariate Analysis,
vol. 61, no. 1, pp. 38 – 60, 1997.
... Jiang et al. [17] demonstrated that different machinelearning techniques can be used to extract meaningful data from a huge dataset, including extracting information to analyze user behavior. Callara and Wira [15] suggested an algorithm for user classification based on their dataset and found that it could distinguish 108 groups of users with similar online behavior, which meant they could classify each group with similar behavior as a separate group. They proved that classification techniques are useful in analyzing and labeling test data into known types of classes. ...
... The classifiers were K nearest neighbor (KNN), support vector machine (SVM), logistic regression (LR), adaptive boosting (AdaBoost) classifier, decision tree classifier, and the random forest classifier. Chosen algorithms have been selected as they are commonly used by researchers and in practice for user classification in different fields, as in the work of Kotsiantis et al., [42], Osisanwo et al., [43], and other studies mentioned in this work [11,15,16,17,34,34]. They are described in the following subsections. ...
... User Behavior Analytics (UBA) is a process of tracking, analysing, and interpreting user actions and interactions within a digital environment, such as a website or application. It involves collecting data on user behaviour, including actions such as clicks, page views, form submissions, and transactions, and using analytical techniques to derive insights into user preferences, patterns, and trends in the same way as it is described in [1,2]. By understanding how users navigate and engage with digital platforms, organisations can optimise user experiences, enhance product offerings, and achieve business objectives more effectively. ...
Article
Full-text available
The development of information technologies in IT business increases the interest in executing machine learning models directly on the client browser, reducing the load on the server and the number of levels of access to it. At the same time, some features have advantages and disadvantages, associated with a smaller amount of information transmitted over the network, limited power of client devices, and others. Among modern client-side tools with machine learning capabilities, Tensorflow.js is suitable, which can be used to analyse user behaviour in web applications for classification and clustering models based on their behavioural patterns, predict future user behaviour trends, detect unusual or suspicious user actions, recommendation models based on their previous behaviour. The article analyses the features of implementation and the limitations associated with the use, specifically regarding the behaviour of users in social networks. The model was formed based on data from news posts on social networks Instagram and Facebook, with the following parameters of user activity, such as the number of likes, comments, and shares according to the post's text. These aspects are a significant addition to the tools that can be applied within the economic, technical, and other means of IT business development. Considering this, it is advisable to study the formation and development of the innovation management system in e-business in the future.
... Machine learning, according to Li et.al, (2013), is the application of artificial intelligence that enables frameworks to automatically take in new information and advance without being consciously updated. The main objective of the field is to develop software that can utilize machine learning to learn on its own and at the right pace (Callara & Wira, 2018). ...
Article
Full-text available
Any cloud data storage model's main goal is to make data accessible without sacrificing its security. Any model of cloud data storage that aims to ensure safety and effectiveness must take security into account. This research offers a secure model for cloud-based data protection. The suggested model offers a solution to the security problems that the cloud faces, including protection of data from infringements and defense against users with fictitious authorizations which have a negative impact on cloud security. The advantages and efficiency of security in cloud computing are provided by our suggested paradigm, which also improves cloud data encryption. It offers users of cloud computing security and scalability of data exchange. Our architecture satisfies the security requirements for cloud computing, including encryption, authorization, identification and authentication. Additionally, this architecture guards the system against any fictitious data owner who enters dangerous data that could undermine the primary objective of cloud services.
... This also reduces IT supervision and update costs compared to on-premises data hubs. As a result, cloud computing can be less expensive [23]. ...
Article
Full-text available
The on-demand availability of end-user resources, in particular data storage and processing power, without a direct or customer-defined organization is referred to as "cloud computing." Distributed computing is a term widely used yet may have different meanings to different people. Customers may access both public and private data using the cloud computing model. The potential of simultaneously requesting data from several clients of the same source, which slows down the source's response time, is the most significant security risk with cloud computing. Other security concerns with cloud computing include weaknesses in the client and connection. By reducing the delay between a client's request for data and the cloud source's answer, a method was developed in our recent research to enhance the performance of cloud computing. By requesting data from several clients from the same source at once or from multiple clients from the same source or from other sources at various times in the same network, four instances were shown. By testing request and response times while protecting data from loss and noise, the findings demonstrated the system's robustness.
... ML refers to the application of artificial intelligence, which permits systems to naturally take in new information and improve themselves without needing to be explicitly customized. ML is primarily concerned with developing computer programs that can determine an appropriate learning rate and use it to educate themselves [5]. ...
Article
Full-text available
This paper is presented to explain the intersection of cloud computing (CC) and machine learning (ML), focusing on their synergies, challenges and solutions. It shows the changes in the Internet service area led by cloud computing (CC) and the economic impact of data collection and analysis. The document specifically addresses security issues in distributed models and introduces the concept of edge computing as a version of cloud computing (CC) for time-sensitive data. In this paper, we are discussing about data encryption, distribution of rights, and transfer of data responsibility from service providers to end users. The document breaks down Cloud computing into service and delivery models, addressing security issues related to integrity, availability, and identity threats. It offers machine learning (ML) algorithms as a solution to security and data quality management. With the help of this paper, we are highlights the challenges of integrating Cloud Computing (CC) and machine learning (ML), including data exchange latency, scalability optimization, resource management, data security, model deployment, and monitoring. A resource plan is provided to train organizations in the use of Cloud Computing (CC) and machine learning (ML). The summary ends by highlighting the evolution of Cloud Computing (CC) and machine learning (ML) integration to shape the future of computing and analytics and make organizations more competitive in the digital age.
Article
Full-text available
Customers look at ratings and feedback from other customers before deciding whether or not to buy a product. This content may take the form of favorable or negative reviews written by customers who have lately used the product in question. The Machine Learning Calculation may be able to assist us in both the visual depiction of the information as well as the factorization of the information. This investigation of customer behavior uses the Naive Bayes and Logistic Regression methods, both of which are presented in this work. The tactics based on logistic regression performed preferable than those based on other approaches. The contemporary problems are dissected, and then the contemporary solutions to those problems are presented and discussed. Afterwards, the results of the test indicate that the recommended technique has a greater accuracy as well as a higher recall and F1 score. The technique ends up being successful, with a high degree of precision on the comments. Python Spider 3.7 is the programmer that is used in order to carry out both the replication and the analysis. Introduction Machine learning (ML) is the scientific research of computations and factual models that computer systems use to do a certain activity without employing explicit directions, based on instances and derivation all things considered. It is a subfield of the field of artificial intelligence (AI). It is considered a subtype of the more general mechanised thinking. Calculations that are performed as part of machine learning generate a mathematical model that is reliant on example data. This model is referred to as "planning information," and it is used to make predictions or judgements without being explicitly modified to carry out the job [1]. Machine learning calculations are used in a broad variety of applications, such as email sorting and computer vision, in situations when it is difficult or impossible to build a traditional count for enough people carrying out the activity. The concept of machine learning is inextricably linked to that of computational intelligence, which centres on the use of personal computers to provide projections. The study of scientific smoothing out contributes to the area of machine learning by bringing new techniques, theoretical frameworks, and application spaces to the table. Information mining is a subfield of the larger science of machine learning that focuses on the unassisted and exploratory processing of information [3, 4]. Machine learning, in the context of its application across a variety of business concerns, may also be seen as intuitive exploration. A couple of the learning computations concentrate on developing more accurate representations of the data sources that were presented during training. [11] Models that are examples of excellence combine head segment analysis with group analysis. Feature learning algorithms, which are also known as portrayal learning algorithms, frequently attempt to preserve the information in their information while also transforming it to such an extent that makes it useful. This is typically done as a pre-handling task prior to the operation of performing classification or expectations. This approach makes it possible to recreate the data sources by beginning with the unknown information providing dispersion. This is accomplished without giving priority to configurations that are very improbable given the conveyance in question. This eliminates the need for human component planning and makes it possible for a machine to familiarise itself with the characteristics as well as make use of them to carry out a particular activity.
Book
Full-text available
Cloud-based Intelligent Informative Engineering for Society 5.0 is a model for the dissemination of cutting-edge technological innovation and assistive devices for people with physical impairments. This book showcases Cloud-based, high-performance Information systems and Informatics-based solutions for the verification of the information support requirements of the modern engineering, healthcare, modern business, organization, and academic communities. Features: Includes broad variety of methodologies and technical developments to improve research in informative engineering Explore the Internet of Things (IoT), blockchain technology, deep learning, data analytics, and cloud Highlight Cloud-based high-performance Information systems and Informatics-based solutions This book is beneficial for graduate students and researchers in computer sciences, cloud computing and related subject areas. Cloud-based Intelligent Informative Engineering for Society 5.0 Edited ByKaushal Kishor, Neetesh Saxena, Dilkeshwar Pandey Edition1st Edition First Published2023 eBook Published5 April 2023 Pub. LocationNew York ImprintChapman and Hall/CRC DOIhttps://doi.org/10.1201/9781003213895 Pages234 eBook ISBN9781003213895 SubjectsComputer Science, Engineering & Technology
Conference Paper
Full-text available
The Internet of Things allowed us to seamlessly integrate communication and computational capabilities into everyday things that resulted in a technologically enhanced environment. However, we still need to work on integrating high level understanding and intelligence in this connected system. The IoT is a mean that enables the possibility of integrating intelligent behavior and services into surrounding environments. One of the most representative examples of artificial environments are buildings. Residential buildings (e.g. homes, apartment blocks) or dedicated public buildings (educational, medical, commercial, governmental) serve different purposes and needs, and therefore they have different characteristics and constraints. However, every building uses some form of resource (e.g. energy, water) in order to assure the required level of comfort, safety and conditions for carrying out the desired activities. In this paper we take a look at some questions regarding the construction and the exploitation of knowledge related to different types of buildings in order to optimize the use of different resources while still assuring the occupants’ comfort. We enumerate some of the elements that characterize a building as smart and finally, we present a model for a building management system based on hybrid knowledge.
Article
Full-text available
Security threat from senseless terrorist attacks on unarmed civilians is a major concern in today's society. The recent developments in data technology allow us to have scalable and flexible data capture, storage, processing and analytics. We can utilize these capabilities to help us in dealing with our security related problems. This paper gives a new meaning to behavioral analytics and introduces a new opportunity for analytics in a typical university setting using data that is already present and being utilized in a university environment. We propose the basics of a system based on Big Data technologies that can be used to monitor students and predict whether some of them are becoming prone to deviant ideologies that may lead to terrorism.
Article
Full-text available
ROLE OF ERGONOMICS IN STUDIES OF USABILITY AND USER EXPERIENCE Many concepts and methodologies exist today that are based on the idea that products and technical systems used in both work and everyday life need to satisfy a number of criteria if they are to be accepted and used under normal conditions. These new concepts, particularly those related to « user experience », bring into question the role played by human factors and ergonomics in the design of products, as well as the contributions they can make. Also brought into question are the concepts and methods that are borrowed from other disciplines, and the development of methodological tools used. This paper aims to examine such issues, together with any developments.
Book
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for "wide" data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
Article
The proliferation of mobile devices has changed the way digital information is consumed and its efficacy measured. These personal devices know a lot about user behavior from embedded sensors along with monitoring the daily activities users perform through various applications on these devices. This data can be used to get a deep understanding of the context of the users and provide personalized services to them. However, there are lot of challenges in capturing, modeling, storing, and processing such data from these systems of engagement, both in terms of achieving the right balance of redundancy in the captured and stored data, along with ensuring the usefulness of the data for analysis. There are additional challenges in balancing how much of the captured data should be processed through client or server applications. In this article we present the modeling of user behavior in the context of personalized education which has generated a lot of recent interest. More specifically, we present an architecture and the issues of modeling student behavior data, captured from different activities the student performs during the process of learning. The user behavior data is modeled and sent to the cloud-enabled backend where detailed analytics are performed to understand different aspects of a student, such as engagement, difficulties, preferences etc. and to also analyze the quality of the data.
Article
Many concepts and methodologies exist today that are based on the idea that products and technical systems used in both work and everyday life need to satisfy a number of criteria if they are to be accepted and used under normal conditions. These new concepts, particularly those related to user experience, bring into question the role played by human factors and ergonomics in the design of products, as well as the contributions they can make. Also brought into question are the concepts and methods that are borrowed from other disciplines, and the development of methodological tools used. This paper aims to examine such issues, together with any developments.
Conference Paper
In this paper, a method is proposed to automatically identify or retrieve parameters of an energetic model of servers in Internet DataCenters (IDC). Using historical measurements of power consumption and Informatics Technology (IT) load, the parameters of server model are identified as defined in previous publications. This identification is used to introduce the correct model of server in a global optimal energy management system of IDC. This work is part of a bigger three years project, called EnergeTIC, which aim is to optimize the energy consumption of a global IDC, taking into account all its components, from servers to their cooling systems.
Chapter
Suppose dissimilarity data have been collected on a set of n objects or individuals, where there is a value of dissimilarity measured for each pair.The dissimilarity measure used might be a subjective judgement made by a judge, where for example a teacher subjectively scores the strength of friendship between pairs of pupils in her class, or, as an alternative, more objective, measure, she might count the number of contacts made in a day between each pair of pupils. In other situations the dissimilarity measure might be based on a data matrix. The general aim of multidimensional scaling is to find a configuration of points in a space, usually Euclidean, where each point represents one of the objects or individuals, and the distances between pairs of points in the configuration match as well as possible the original dissimilarities between the pairs of objects or individuals. Such configurations can be found using metric and non-metric scaling, which are covered in Sects. 2 and 3. A number of other techniques are covered by the umbrella title of multidimensional scaling (MDS), and here the techniques of Procrustes analysis, unidimensional scaling, individual differences scaling, correspondence analysis and reciprocal averaging are briefly introduced and illustrated with pertinent data sets.
Article
The paper examines the epistemological status of a comprehensive model of purchase and consumption derived from a critique of behavior analysis. Part 1 describes the provenance of the model's research program. Part 2 identifies the complexities with which a behavior analytical model of consumer choice contends: verbal behavior and marketing interventions in affluent consumer-orientated economies. Such complexity results in an interpretative account of consumer behavior. In Part 3 the Behavioral Perspective Model of purchase and consumption is described and refined. The model's components — the consumer's learning history, the consumer behavior setting, purchase and consumption responses, and their reinforcing and punishing consequences — are derived and described. Four operant classes of consumer behavior, defined by the environmental contingencies controlling them, are identified: Maintenance, Accumulation, Pleasure and Accomplishment. In Part 4, the model employs these classes in the interpretation of broad sequences of consumer choice. First, consumer behavior is described as a hierarchy of these opérants over the consumer life cycle, exemplified by reference to household saving and financial asset management; the opérant classification is then used to interpret consumer behavior as an evolutionary process, exemplified by the adoption and diffusion of innovations. Finally, the model is evaluated according to the criteria of description, delimitation, generation and integration.