Conference PaperPDF Available

User Behavior Analysis with Machine Learning Techniques in Cloud Computing Architectures

November 2018

November 2018

DOI:10.1109/ICASS.2018.8651961

Conference: 2018 International Conference on Applied Smart Systems (ICASS)

Authors:

Patrice Wira

Université de Haute-Alsace

This paper presents the use of machine learning algorithms to analyze the behaviors of users working in a distributed computer environment. The objective consists in discriminating groups of close users. These groups are composed of users with similar behaviors. Event related to the user's behaviors are recorded and transferred to a database. An approach is developed to determine the groups of the users. A non-parametric method of estimating a probability density is used to predict application launches and session openings in an individual way for each user. These algorithms have been implemented and demonstrated their effectiveness within a complete virtualization environment for workstations and applications under real conditions in a hospital.

ICT general architecture for virtualized applications.

…

Figures - uploaded by Patrice Wira

Content may be subject to copyright.

Content uploaded by Patrice Wira

Content may be subject to copyright.

User Behavior Analysis with Machine Learning

Techniques in Cloud Computing Architectures

Matias Callara ∗, Patrice Wira ∗

∗Universit´

e de Haute Alsace, IRIMAS Laboratory, 68093 Mulhouse, France

Email: {matias-ezequiel.callara, patrice.wira}@uha.fr

Abstract—This paper presents the use of machine learning

algorithms to analyze the behaviors of users working in a dis-

tributed computer environment. The objective consists in discrim-

inating groups of close users. These groups are composed of users

with similar behaviors. Event related to the user’s behaviors are

recorded and transferred to a database. An approach is developed

to determine the groups of the users. A non-parametric method

of estimating a probability density is used to predict application

launches and session openings in an individual way for each user.

These algorithms have been implemented and demonstrated their

effectiveness within a complete virtualization environment for

workstations and applications under real conditions in a hospital.

Index Terms—Machine learning, user behavior analytics, be-

havior analysis, user classiﬁcation, prediction

I. INT ROD UC TI ON

Working software environments, in the broadest sense of

the term, can no longer do without remote capabilities. More

recently, it has been see that mobility and productivity are

linked, since the ﬁrst improves the second. Indeed, in a form

of ultimate culmination of mobility, the user’s workstation

can be a notebook computer but also smartphone or any

other connected object like a tablet, a connected television,

a speciﬁc terminal, or even a connect object from the Internet

of Things (IoT). In these terminals, not only the data have to

be loaded from and sent to the server but also the applications

and even complete working environments. Thus, Information

and Communication Technology (ICT) architectures must be

designed with the possibility to render the applications on the

terminal and without installation. This is the new generation

of a cloud computer architecture that involves deploying vir-

tualization software that combines workstation virtualization

and application virtualization. These virtualization solutions

provides mobility or even a form of ultra-mobility that is now

necessary and indispensable to the users to be more effective

in achieving tasks.

In virtual desktops, launching an application can take 15

to 20 seconds while loading a web page from the Internet

takes less than a second. The requirements of the users are

natural, the ICT architecture must evolve to satisfy them and

to tend towards real time. Therefore, the challenge consists

in the analysis of behaviors and the classiﬁcation of users to

predict their future activities.

By predicting the users activities, it will be possible to

anticipate the opening of a session or the launch of an

application and thus, the data and other resources that are

necessary can be made available in advance, i.e., sessions and

applications can be pre-loaded. Predicting the users activities

is also a way to optimize the resources on the server side [1].

The prediction algorithms can be based on recent advances

in Machine Learning (ML) theories [2]. They are appropriate

to handle the big amount of information collected from the

user and to digest the heterogeneity of the generated data.

Obviously, ML techniques are able to evaluate their own ef-

fectiveness. The learning capabilities of these techniques allow

them to constantly adapt to changes in users behavior and to

achieve virtualization of workstations that evolve according to

the needs.

The rest of the paper is organized as follows. The next

section describes User Behavior Analytic (UBA) issues. Sec-

tion III reviews ML approaches for UBA applications. Speciﬁc

learning techniques are presented in Section IV and implemen-

tation aspects and results are provided in Section V. Finally,

conclusions are drawn in Section VI.

II. BE HAVIOR ANA LYSIS AND USE R DET EC TI ON

Since several years, behavioral analysis has been the focus

of intense efforts in marketing applications [3]. Obviously, the

objective is to adopt some new speciﬁc and efﬁcient marketing

strategies that are based on data, i.e., recorded information that

represent the past activities of potential clients. This is referred

to as data-based behavioral marketing. Behavioral analysis has

also found its usefulness in the ﬁght against fraud and in

various other applications [4]. Now, it is not a surprise to

see that behavioral analysis can enhance ICT, organize more

efﬁciently production tools, detect internal threats like targeted

attacks, adapt softwares to the users, accelerate some repetitive

tasks, etc. However, it goes with a certain acceptability from

the users [5].

A. Deﬁnition and Objective

A user model is a representation of a user or a group of

users in an ICT system [6], [7]. This model includes a set

of parameters and/or data that are representative of the user’s

past behavior.

The development of user models starts with the design

of systems able to collect all the data that are necessary

to represent the users. The data can be used to get a deep

understanding of the users [8]. In some contexts, the data

related to a single user are huge and it is necessary to

deﬁne reliable models of users or groups of users. These

M. Callara and P. Wira, "User Behavior Analysis with Machine Learning Techniques in Cloud Computing Architectures,"

International Conference on Applied Smart Systems (ICASS 2018), Médéa, Algeria, 24-25 November, 2018.

DOI: 10.1109/ICASS.2018.8651961

models are made of features and parameters that represent

the users or groups of users in their activities performed

through various applications. These models can then be used

as a basis for providing personalized services to them. Indeed,

missing information can be retrieved, speciﬁc categories can

be deduced, future activities and behaviors can be predicted

and all this helps to an interactive, adapted and personalized

interaction with the users. Applications are various, in natural

dialogue processing, in speech transcription systems, in new

tools for business strategy and marketing, in human resource

management, to detect security anomalies, etc. At the very

end, the main goal is always to improve the user experience.

B. Behavior Analysis

The UBA is the discipline of analyzing user behaviors. In an

operational way, it is essentially the collecting, monitoring and

processing of user data. The data sets collected from the users

are stored in databases, data log ﬁles, histories, directories, and

furthermore any other systems recording the user behaviors.

The purpose of this process is to provide parameters and to

build reliable and usable models of users, in other words, that

accurately characterize the users.

For example, the Internet has become a privileged space

for this type of application [9]. Indeed, technologies are now

mature, ready and spread out in order to collect and exploit

the present and past behavior of individual Internet users in

real time. The interests, the attendances, the facts and gestures,

the movements, the attitudes, the lifestyle, the living standard,

etc. are deduced from data sets being produced by surﬁng on

the Internet. Obviously, the status of a user can evolve and

change at any time. Techniques make it possible to adapt the

models on the basis of the experiment and according to the

evolutions of the collected data in real time.

The UBA relies on three pillars: Data analysis, data in-

tegration and data presentation. Actually, the analysis and

processing the phenomenal amount of data is the most dif-

ﬁcult challenge. The heterogeneity, volume and speed of data

generation are increasing rapidly. This is exacerbated with

the use of wireless networks, IoT sensors, smartphones and

the increasing activities on the Internet. Therefore, real time

UBA must be fast in processing the big amount of data and

ML algorithms should be appropriate candidates [10]. For that

purpose, ML algorithms must be run in real time, access to

the whole data sets, adapt their own parameters, i.e., learn.

ML algorithms can also be interfaced with enterprise resource

planning softwares to get additional information about the

users and to combine them with their past and present activities

while processing. The idea is to enable the establishment of

self-adaptive models.

III. MACH IN E LEARNING ALGORITHMS FOR UBA

The techniques of ML represent a branch of statistics and

computer science and studies the algorithms and architectures

capable of learning from the observed facts, i.e., measured

data [2], [11]–[13]. These techniques include artiﬁcial neural

networks with supervised learning, Bayesian decision the-

ory, parametric, semi-parametric and non-parametric methods,

multivariate analysis, hidden Markov models, reinforcement

learning, kernel estimators, graphical models, statistical tests...

ML methods through a learning process are able to self-

adjust their own parameters from a data set. This set of data

contains all the coherent information that are necessary for

example to bring out a classiﬁcation, modeling or prediction

task. Furthermore, it is often necessary to separate all the

data available in two sub-sets: 1) The learning set which is

used to learn or to calculate the optimal parameters of the

learning machine; 2) The test set which is used to verify the

performance of the machine after learning from the previous

set.

The quickly growing amount of data collected via the

Internet and IoT has promoted the developments of ML

techniques [14]. Many companies have already their own data

harvesting tools and now they are faced with the challenge of

exploiting them in an effective and relevant way.

The user model that must be used varies according to the

applications and the objectives [15]. User models may seek to

describe:

1) The cognitive processes underlying the user’s actions;

2) A difference between user’s skills and expert skills;

3) Behavioral patterns or user preferences;

4) The characteristics of the user.

The ﬁrst applications of ML techniques for UBA were

centered on the ﬁrst two types of models. Most recently,

research activities focus on developing the third type of model

and try to ﬁnd out user preferences. Finally, the applications

of ML techniques aimed at discovering the characteristics of

the users - i.e., related to the fourth type of the previous list -

remain scarce. Today it is the scientiﬁc issue that is the most

interesting to explore and that attracts the most attention. In the

design of user models, it is important to distinguish between

approaches to model individual users or communities, classes,

groups of users.

The major limitations in implementing and using automatic

ML techniques for UAB purposes are the followings:

•The amount of data that sometimes requires very large

computing capabilities. Indeed, in most situations, ML

algorithms require a relatively large number of examples

to be precise.

•The validity of the data included in the learning set and

which is necessary for a ML algorithm to build user

models with an acceptable accuracy. In other words, how

can we be sure that a data set corresponds exactly to a

type of user, to one of its behaviors, to atypical and/or ab-

normal behaviors, to changes in the behavior of users...?

A simplistic strategy is to use a large amount of data to

compensate for uncertainties, exceptions, deviations, etc.

IV. UBA IN VIRTUA L DES K ENVIRONMENTS

A. Workstation Virtualization Context

Virtualization of workstations is a logical evolution of

digital transformation. It allows employees to work with fewer

Database Server

Terminal

Application Server Application Server

File Server

Virtual PC

Directory Server

PDA Tablet

RDP Channel

Management Server

Load Balancer

Fig. 1. ICT general architecture for virtualized applications.

constraints and at the same time it reduces the costs for the

ICT administrator. Indeed, this type of infrastructure has the

advantage of signiﬁcantly reducing the maintenance and client-

side management tasks while providing the user employee

with its full working environment (settings, ﬁles, software,

etc.) and this whatever the material support. Such a cloud

computer architecture is represented by Fig. 1.

The beneﬁts workstation virtualization are multiple:

•It increases the employee productivity and mobility

through a single solution and without compromising the

security.

•It delivers remote access while respecting conﬁdentiality,

compliance, and risk management standards.

•It reduces computer-related costs and complexity by cen-

tralizing application and workstation management, and

other automating common tasks on the server side.

•It freee up the ICT resources by simplifying the manage-

ment of applications, workstations and data.

The bulk of the conﬁguration and calculation tasks is

then fully focused on the server side. Companies such as

Citrix1and Systancia2offer a comprehensive and complete

software solution for virtualized desktop and applications. In

this king of cloud computer architecture, several servers are

used and they must be organized and designed to provide users

with real time access to data and applications. A strategy to

allocate the computing loads between servers is required and

its implementation is achieved by the management server in

Fig. 1. The efﬁciency of this strategy is improved by the

prediction of opening/closing sessions and of launching of

remote applications for each user. By predicting the behavior

of users, it becomes possible to improve their experience [16].

1www.citrix.com

2www.systancia.com

B. Proposed Algorithm for User Classiﬁcation

Here, the objective is to classify each individual user ac-

cording only to his previous behaviors. To do this, the instants

when a remote application has been launched by a user are

recorded. This allows to build a histogram from the instants of

application launches for each user. Then, a dissimilarity matrix

is calculated using the Jensen-Shannon divergence [17]. This

dissimilarity matrix is then used to project each dot, i.e., user,

that represents a histogram in a multidimensional space in

a new two-dimensional space by ensuring that the distances

between the dots (the dissimilarities) are preserved [18].

Finally, the dots in the new space with reduced dimensions

can be grouped to set up classes or categories of users. The

K-means algorithm [19] is used to determine similar groups

of users.

C. Proposed Algorithm for the Prediction of Application

Launches

Now, the objective is to predict the instant when a user

will launch a remote application. The proposed prediction

algorithm is only based on user-past behaviors. The instants

when a user has launched a remote application has been

recorded and are available at ay time.

As an example, the time interval in which the user will

start the ﬁrst application in a day is predicted. A time

granularity is used to estimate discrete probability distribution

P(H|W D, U )where H= 0, ..., 23 is the time, W D =

M o, T u, W e, T h, F r, S a, Su is the day of the week and U

represents a user. For each user, the opening of the day’s

applications is predicted by calculating the interval or po-

tentially multiple intervals in which the event is supposed

to occur. This is achieved with a very ﬁne granularity by

estimating the probability distribution of launches over time by

a Kernel Density Estimator (KDE) estimator [20]. It is a kernel

Fig. 2. View of an application launch with the virtualization software AppliDis

Fusion from Systancia.

estimation technique (the Parzen-Rosenblatt method) which is

a non-parametric method of estimating the probability density

of a random variable. The estimator is formed by the average

of the Gaussian curves called kernels. To take into account

the data periodicity, a circular distribution function has been

chosen as the basic function deﬁning a kernel.

The periodicity is deﬁned for a user by the average time

elapsed between two application launches. It can be very

different from one user to another one. We use an envelope

of several Gaussians to obtain an estimate of the probability

density.

1) Model Selection: The behavior of the users will, in

the general case, be a composition of motifs with different

periodicities. The challenge consists in ﬁnding a period Tthat

will generate a probability distribution with a low differential

entropy. For each period T, one or more bandwidths of the

kernels can be applied and we have chosen the cross-validation

technique to select the bandwidth [20].

2) Optimization: The period Tis determined by the res-

olution of an optimization problem based on a cost function

J. This allows you to adjust a compromise between taking

into account the size of the interval in order to increase the

likelihood of covering an application launch and computational

resources (CPU and server RAM). Of course, the more the

interval is high, the more prediction accuracy is good and the

required computational costs bulky. In the end, it is the system

administrator that settles this compromise. In practical terms,

for users this amounts on one side by preloaded applications

kept idle but fast to be launched and on the other side by larger

delays for launching applications.

V. IMPLEMENTATION AND RE SU LTS

A. Pratical Aspects

The algorithms have been implemented within AppliDis,

a software for virtualization of workstations and applications

hours of the week

0 24 48 72 96 120 148

quantity of hourly logons

2000

4000

6000

8000

Fig. 3. Number of logons per hour in a French university hospital over a

period of a week.

Fig. 4. Two dimensional projection of the users (1 dot for 1 user) with 6

clusters separated with the K-means algorithm for user classiﬁcation.

in a single management console. Clients of such a software

are companies, large accounts, hospitals, or other big orga-

nizations with a large amount of users, with users showing

different proﬁles (ofﬁce staff, doctors, technicians, nurses,

etc.), with users requiring different needs and where some

users must access to remote data and application up to 24/24h.

The ICT cloud architecture is the one of Fig. 1 and Fig. 2

shows the remote application launch through AppliDis Fusion

5 by a user on its workstation.It is the virtualization software

that allows to access to applications hosted on the servers. This

software also stores the instants when an applications has been

asked for, when it is available, when it is closed, etc.

Tests have been achieved on a real cloud computer systems

with the virtualization of workstations and applications. In

practical terms, data have been collected from a French

University Hospital which includes around 800 users accessing

110 applications distributed over 35 servers 24/24h. The

resulting data set represents a period of approximately 12

months. The activities and the user behavior can be seen by

the histogram in Fig. 3 which shows the number of logons

during a full week with a resolution of 1h. In this particular

example, a user opens 5 applications per day in average. This

number can vary from 1 to 82 applications per day depending

on the user. The most frequent periodicities detected in all the

users are 24, 12, and 8h. This means that some applications

are launched every 24, 12 or 8 hours.

B. User Classiﬁcation

The proposed user classiﬁcation algorithm is evaluated on

the data set and context previously described.

The algorithm allows to distinguish 108 groups of users.

The user have been projected in a high-dimensional space.

In this case, a 108-dimensional space has been used. A user

is represented by a dot where each coordinate represents the

utilization periodicity of a certain application. A dimension

reduction algorithm is used convert the dots from the high

into a lower dimensional representation. This algorithm must

preserve the distances because 2 dots which are close in the

108-dimensional space represents 2 similar users. The results

of converting the 108-dimensional space into a plane (2D) is

presented by Fig. 4. The 2 dimensions have not units and are

not related to any physical parameters. In this ﬁgure, there

are 765 dots where each corresponds to a user. The simple K-

means algorithm has been used to group the dots into clusters.

We chose k=6 cluster and the color of the dots represents its

cluster.

We compared the results obtained with the dimension re-

duction algorithm based on the dissimilarity matrix calculated

with the Jensen-Shannon divergence to other dimension reduc-

tion techniques like Principal Component Analysis (PCA) or

Multidimensional Scaling (MDS). We noticed that the use of

the dissimilarity matrix generates the best results.

C. Prediction of Application Launches

The proposed prediction algorithm is evaluated on the same

data set in order to estimate the future instant of remote

application launches by users.

What characterizes a user is the period T. The choice of

the interval to predict the launch of an application is achieved

by knowing that this interval must cover a probability of mass

greater than a certain threshold. For each user, the number of

predicted application launches is represented by a single num-

ber, the Area Under the Curve (AUC). This AUC varies from

0 to 1 and is directly related to the size of the interval taken

into account for the prediction. This is shown in Fig. 5. On this

ﬁgure, the yellow bars represent the probability distribution of

application launches by hour for a weekday (Friday) for the

user with id=3,P(H|W D =F riday, U = 3). The red curve

is the complementary cumulative probability mass function,

this is the probability P(H > h|W D =F r iday, U = 3).

Obviously, the probability decreases when the time moves

forward to the end of the day. Finally the blue bars allow to

highlight the part of the probability distribution that is covered

by the predicted interval from Hour = 10 to H our = 21, this

is P(10 ≤H≤21|W D =F riday, U = 3).

The entropy of the behavior will allow to estimate the

upper bound of the interval, and the algorithm will make the

Hour

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Cumulative Probability

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Logins (AppliDis Session) Start Dates Histogram

UserId: 3; Weekday: Friday

Interval

Fig. 5. Example of the cumulative probability and probability distribution for

the launch of an application during a user session.

Interval Duration as % of the total period T

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Cumulative Probability

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

AUC: 0.65745

AUC: 0.76906

Cumulative Probability vs Interval Duration

Fig. 6. Relationship between cumulative probability and the duration of the

interval, here with examples for 2 different values of the AUC.

predictions by trying to reach this limit. The exact value of the

interval’s upper bound cannot be known in advance, so it is not

possible to know whether a prediction is optimal or not. We

seek to improve the prediction performance at each iteration of

the algorithm by using an AUC as close as possible to 1. For

example, Fig. 6 shows the cumulative probability according to

a part (in percent) of the interval with two different values of

the AUC. Finally, in this test 97% of users have predicted

application launches which means that the application will

then be preloaded on their terminal.

By loading the applications in advance with the prediction

algorithm, the user can see a reduced time delay. For the

administrator of the cloud computer architecture, the issue

consists in ﬁnding an acceptable compromise between pre-

dictive performance, i.e., accelerated applications, and server

resources, i.e., CPU, RAM, and power consumption. Our

behavior analysis tools are integrated in the virtualization

software and are available to the administrator who can view

the system performances and additional indicators calculated

and predicted by the ML techniques.

We presented an implementation of ML algorithms for user

behavior analyze purposes in a cloud computing environment

that combines workstation virtualization and application virtu-

alization. Our user behavior analysis consists in the classiﬁca-

tion of users and in the prediction of some of their activities

such as application launches. The concept of UBA (User

Behavior Analytic) has been used in this speciﬁc context.

VI. CONCLUDING REMARKS

Dissimilarity measures and data clustering methods (K-

means) have enabled the identiﬁcation of groups of similar

or closely related users. Then, the time interval in which a

user will launch an application has been predicted by using

a non-parametric estimating method of a probability density.

It consists in a Kernel Density Estimator (KDE) estimation

technique.

These algorithms have been implemented within a work-

station and application virtualization software that is able

to track and visualize users’ activity and behavior in real

time. Thanks to the algorithms previously mentioned, the

virtualization software is thus also able to predict in real time

the openings of sessions and applications of users regardless

of the periodicity of the past information. This has been

veriﬁed in a working environment and under real operation

conditions. A performance analysis shows that the machine

learning techniques are effective in clustering the users and in

predicting their behaviors.

The proposed solution aims to ensure a fast remote access

to the applications for the user while reducing the maintenance

costs for the ICT architecture administrator.

ACKNOWLEDGMENT

The authors would like to thank the Systancia company for

supporting this work and providing anonymized data from a

cloud computer system.

REFERENCES

[1] G. Warkozek, V. Debusschere, and S. Bacha, “Automated parameters

retrieval for energetic model identiﬁcation of servers in datacenters,” in

IEEE PowerTech, 2013, Conference Proceedings.

[2] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical

Learning: Data mining, inference and prediction, ser. Springer Series in

Statistics. Springer-Verlag, 2013.

[3] G. R. Foxall, “Behavior analysis and consumer psychology,” Journal of

Economic Psychology, vol. 15, no. 1, pp. 5 – 91, 1994.

[4] A. R. Baig and H. Jabeen, “Big data analytics for behavior monitoring

of students,” Procedia Computer Science, vol. 82, pp. 43–48, 2016.

[5] J. Barcenilla and J.-M.-C. Bastien, “Acceptability of innovative technolo-

gies: Relationship between ergonomics, usability, and user experience,”

Le travail humain, vol. 72, no. 4, pp. 311–331, 2009.

[6] A. Kobsa, “User modeling: Recent work, prospects and hazards,” Human

Factors in Information Technology, vol. 10, pp. 111–111, 1993.

[7] ——, “Generic user modeling systems,” User Modeling and User-

Adapted Interaction, vol. 11, no. 1, pp. 49–63, 2001.

[8] O. Bent, P. Dey, K. Weldemariam, and M. K. Mohania, “Modeling user

behavior data in systems of engagement,” Future Generation Computer

Systems, vol. 68, pp. 456–464, 2017.

[9] M. Pazzani and D. Billsus, “Learning and revising user proﬁles: The

identiﬁcation of interesting web sites,” Machine Learning, vol. 27, no. 3,

pp. 313–331, 1997.

[10] R. F. Molanes, K. Amarasinghe, J. J. Rodriduez-Andina, and M. Manic,

“Deep learning and reconﬁgurable platforms in the internet of things,”

IEEE Industrial Electronics Magazine, vol. 12, no. 2, pp. 36–49, 2018.

[11] C. Bishop, Pattern Recognition and Machine Learning. Springer, 2007.

[12] K. P. Murphy, Machine Learning: A Probabilistic Perspective. The

MIT Press, 2012.

[13] E. Alpaydin, Machine Learning: The New AI. The MIT Press, 2016.

[14] I. Szilagyi and P. Wira, “An intelligent system for smart buildings using

machine learning and semantic technologies: A hybrid data-knowledge

approach,” in 1st IEEE International Conference on Industrial Cyber-

Physical Systems (ICPS 2018), 2018, pp. 20–25.

[15] G. I. Webb, M. J. Pazzani, and D. Billsus, “Machine learning for user

modeling,” User modeling and user-adapted interaction, vol. 11, no.

1-2, pp. 19–29, 2001.

[16] M. Callara and P. Wira, “Machine learning pour l’analyse de com-

portements et la classiﬁcation d’utilisateurs,” in Congr`

es National de

la Recherche des IUT (CNRIUT’2017), 2017.

[17] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles

and Techniques - Adaptive Computation and Machine Learning. The

MIT Press, 2009.

[18] T. F. Cox and M. A. A. Cox, Multidimensional scaling, 2nd ed., ser.

Monographs on statistics and applied probability. Boca Raton, Fla.:

Chapman & Hall/CRC, 2001, no. 88.

[19] R. Xu and D. C. Wunsch, “Survey of clustering algorithms,” IEEE

Transactions on Neural Networks, vol. 16, no. 3, pp. 645–678, 2005.

[20] C. O. Wu, “A Cross-Validation Bandwidth Choice for Kernel Density

Estimates with Selection Biased Data,” Journal of Multivariate Analysis,

vol. 61, no. 1, pp. 38 – 60, 1997.

Machine-Learning-based User Behavior Classification for Improving Security Awareness Provision

Article

Full-text available

Jan 2023

User Behavior Analysis Using Web-based Machine Learning Features: New Solutions for IT Business

Article

Full-text available

May 2024

The development of information technologies in IT business increases the interest in executing machine learning models directly on the client browser, reducing the load on the server and the number of levels of access to it. At the same time, some features have advantages and disadvantages, associated with a smaller amount of information transmitted over the network, limited power of client devices, and others. Among modern client-side tools with machine learning capabilities, Tensorflow.js is suitable, which can be used to analyse user behaviour in web applications for classification and clustering models based on their behavioural patterns, predict future user behaviour trends, detect unusual or suspicious user actions, recommendation models based on their previous behaviour. The article analyses the features of implementation and the limitations associated with the use, specifically regarding the behaviour of users in social networks. The model was formed based on data from news posts on social networks Instagram and Facebook, with the following parameters of user activity, such as the number of likes, comments, and shares according to the post's text. These aspects are a significant addition to the tools that can be applied within the economic, technical, and other means of IT business development. Considering this, it is advisable to study the formation and development of the innovation management system in e-business in the future.

Machine Learning Algorithms for Cloud Security

Article

Full-text available

Aug 2023

Any cloud data storage model's main goal is to make data accessible without sacrificing its security. Any model of cloud data storage that aims to ensure safety and effectiveness must take security into account. This research offers a secure model for cloud-based data protection. The suggested model offers a solution to the security problems that the cloud faces, including protection of data from infringements and defense against users with fictitious authorizations which have a negative impact on cloud security. The advantages and efficiency of security in cloud computing are provided by our suggested paradigm, which also improves cloud data encryption. It offers users of cloud computing security and scalability of data exchange. Our architecture satisfies the security requirements for cloud computing, including encryption, authorization, identification and authentication. Additionally, this architecture guards the system against any fictitious data owner who enters dangerous data that could undermine the primary objective of cloud services.

Study on the Design of Algorithm Based on Machine Learning to Improve Cloud Computing

Article

Full-text available

May 2023

Nawar A. Sultan

The on-demand availability of end-user resources, in particular data storage and processing power, without a direct or customer-defined organization is referred to as "cloud computing." Distributed computing is a term widely used yet may have different meanings to different people. Customers may access both public and private data using the cloud computing model. The potential of simultaneously requesting data from several clients of the same source, which slows down the source's response time, is the most significant security risk with cloud computing. Other security concerns with cloud computing include weaknesses in the client and connection. By reducing the delay between a client's request for data and the cloud source's answer, a method was developed in our recent research to enhance the performance of cloud computing. By requesting data from several clients from the same source at once or from multiple clients from the same source or from other sources at various times in the same network, four instances were shown. By testing request and response times while protecting data from loss and noise, the findings demonstrated the system's robustness.

Machine Learning Algorithms for Cloud Computing Security: A Systematic Review

Article

Full-text available

Feb 2023

Sagar Mal Nitharwal

Analysis of User Behavior Patterns using Machine Learning Algorithms

Conference Paper

Nov 2023

THE ROLE OF CLOUD COMPUTING IN MACHINE LEARNING APPROACHES

Article

Full-text available

Aug 2023

This paper is presented to explain the intersection of cloud computing (CC) and machine learning (ML), focusing on their synergies, challenges and solutions. It shows the changes in the Internet service area led by cloud computing (CC) and the economic impact of data collection and analysis. The document specifically addresses security issues in distributed models and introduces the concept of edge computing as a version of cloud computing (CC) for time-sensitive data. In this paper, we are discussing about data encryption, distribution of rights, and transfer of data responsibility from service providers to end users. The document breaks down Cloud computing into service and delivery models, addressing security issues related to integrity, availability, and identity threats. It offers machine learning (ML) algorithms as a solution to security and data quality management. With the help of this paper, we are highlights the challenges of integrating Cloud Computing (CC) and machine learning (ML), including data exchange latency, scalability optimization, resource management, data security, model deployment, and monitoring. A resource plan is provided to train organizations in the use of Cloud Computing (CC) and machine learning (ML). The summary ends by highlighting the evolution of Cloud Computing (CC) and machine learning (ML) integration to shape the future of computing and analytics and make organizations more competitive in the digital age.

Unveiling Threats: Leveraging User Behavior Analysis for Enhanced Cybersecurity

Conference Paper

Jun 2023

An efficient machine learning technique for prediction of consumer behaviour with high accuracy

Article

Full-text available

Jan 2023

Customers look at ratings and feedback from other customers before deciding whether or not to buy a product. This content may take the form of favorable or negative reviews written by customers who have lately used the product in question. The Machine Learning Calculation may be able to assist us in both the visual depiction of the information as well as the factorization of the information. This investigation of customer behavior uses the Naive Bayes and Logistic Regression methods, both of which are presented in this work. The tactics based on logistic regression performed preferable than those based on other approaches. The contemporary problems are dissected, and then the contemporary solutions to those problems are presented and discussed. Afterwards, the results of the test indicate that the recommended technique has a greater accuracy as well as a higher recall and F1 score. The technique ends up being successful, with a high degree of precision on the comments. Python Spider 3.7 is the programmer that is used in order to carry out both the replication and the analysis. Introduction Machine learning (ML) is the scientific research of computations and factual models that computer systems use to do a certain activity without employing explicit directions, based on instances and derivation all things considered. It is a subfield of the field of artificial intelligence (AI). It is considered a subtype of the more general mechanised thinking. Calculations that are performed as part of machine learning generate a mathematical model that is reliant on example data. This model is referred to as "planning information," and it is used to make predictions or judgements without being explicitly modified to carry out the job [1]. Machine learning calculations are used in a broad variety of applications, such as email sorting and computer vision, in situations when it is difficult or impossible to build a traditional count for enough people carrying out the activity. The concept of machine learning is inextricably linked to that of computational intelligence, which centres on the use of personal computers to provide projections. The study of scientific smoothing out contributes to the area of machine learning by bringing new techniques, theoretical frameworks, and application spaces to the table. Information mining is a subfield of the larger science of machine learning that focuses on the unassisted and exploratory processing of information [3, 4]. Machine learning, in the context of its application across a variety of business concerns, may also be seen as intuitive exploration. A couple of the learning computations concentrate on developing more accurate representations of the data sources that were presented during training. [11] Models that are examples of excellence combine head segment analysis with group analysis. Feature learning algorithms, which are also known as portrayal learning algorithms, frequently attempt to preserve the information in their information while also transforming it to such an extent that makes it useful. This is typically done as a pre-handling task prior to the operation of performing classification or expectations. This approach makes it possible to recreate the data sources by beginning with the unknown information providing dispersion. This is accomplished without giving priority to configurations that are very improbable given the conveyance in question. This eliminates the need for human component planning and makes it possible for a machine to familiarise itself with the characteristics as well as make use of them to carry out a particular activity.

Cloud-based Intelligent Informative Engineering for Society 5.0

Book

Full-text available

Feb 2023

Cloud-based Intelligent Informative Engineering for Society 5.0 is a model for the dissemination of cutting-edge technological innovation and assistive devices for people with physical impairments. This book showcases Cloud-based, high-performance Information systems and Informatics-based solutions for the verification of the information support requirements of the modern engineering, healthcare, modern business, organization, and academic communities. Features: Includes broad variety of methodologies and technical developments to improve research in informative engineering Explore the Internet of Things (IoT), blockchain technology, deep learning, data analytics, and cloud Highlight Cloud-based high-performance Information systems and Informatics-based solutions This book is beneficial for graduate students and researchers in computer sciences, cloud computing and related subject areas. Cloud-based Intelligent Informative Engineering for Society 5.0 Edited ByKaushal Kishor, Neetesh Saxena, Dilkeshwar Pandey Edition1st Edition First Published2023 eBook Published5 April 2023 Pub. LocationNew York ImprintChapman and Hall/CRC DOIhttps://doi.org/10.1201/9781003213895 Pages234 eBook ISBN9781003213895 SubjectsComputer Science, Engineering & Technology

An intelligent system for smart buildings using machine learning and semantic technologies: A hybrid data-knowledge approach

Conference Paper

Full-text available

May 2018

The Internet of Things allowed us to seamlessly integrate communication and computational capabilities into everyday things that resulted in a technologically enhanced environment. However, we still need to work on integrating high level understanding and intelligence in this connected system. The IoT is a mean that enables the possibility of integrating intelligent behavior and services into surrounding environments. One of the most representative examples of artificial environments are buildings. Residential buildings (e.g. homes, apartment blocks) or dedicated public buildings (educational, medical, commercial, governmental) serve different purposes and needs, and therefore they have different characteristics and constraints. However, every building uses some form of resource (e.g. energy, water) in order to assure the required level of comfort, safety and conditions for carrying out the desired activities. In this paper we take a look at some questions regarding the construction and the exploitation of knowledge related to different types of buildings in order to optimize the use of different resources while still assuring the occupants’ comfort. We enumerate some of the elements that characterize a building as smart and finally, we present a model for a building management system based on hybrid knowledge.

Big Data Analytics for Behavior Monitoring of Students

Article

Full-text available

Dec 2016

Security threat from senseless terrorist attacks on unarmed civilians is a major concern in today's society. The recent developments in data technology allow us to have scalable and flexible data capture, storage, processing and analytics. We can utilize these capabilities to help us in dealing with our security related problems. This paper gives a new meaning to behavioral analytics and introduces a new opportunity for analytics in a typical university setting using data that is already present and being utilized in a university environment. We propose the basics of a system based on Big Data technologies that can be used to monitor students and predict whether some of them are becoming prone to deviant ideologies that may lead to terrorism.

L'acceptabilité des nouvelles technologies : quelles relations avec l'ergonomie, l'utilisabilité et l'expérience utilisateur ?

Article

Full-text available

Jan 2009

ROLE OF ERGONOMICS IN STUDIES OF USABILITY AND USER EXPERIENCE Many concepts and methodologies exist today that are based on the idea that products and technical systems used in both work and everyday life need to satisfy a number of criteria if they are to be accepted and used under normal conditions. These new concepts, particularly those related to « user experience », bring into question the role played by human factors and ergonomics in the design of products, as well as the contributions they can make. Also brought into question are the concepts and methods that are borrowed from other disciplines, and the development of methodological tools used. This paper aims to examine such issues, together with any developments.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Article

Full-text available

Nov 2004

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)

Book

Feb 2009

During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for "wide" data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

Modeling user behavior data in systems of engagement

Article

Jun 2016
FUTURE GENER COMP SY

The proliferation of mobile devices has changed the way digital information is consumed and its efficacy measured. These personal devices know a lot about user behavior from embedded sensors along with monitoring the daily activities users perform through various applications on these devices. This data can be used to get a deep understanding of the context of the users and provide personalized services to them. However, there are lot of challenges in capturing, modeling, storing, and processing such data from these systems of engagement, both in terms of achieving the right balance of redundancy in the captured and stored data, along with ensuring the usefulness of the data for analysis. There are additional challenges in balancing how much of the captured data should be processed through client or server applications. In this article we present the modeling of user behavior in the context of personalized education which has generated a lot of recent interest. More specifically, we present an architecture and the issues of modeling student behavior data, captured from different activities the student performs during the process of learning. The user behavior data is modeled and sent to the cloud-enabled backend where detailed analytics are performed to understand different aspects of a student, such as engagement, difficulties, preferences etc. and to also analyze the quality of the data.

ROLE OF ERGONOMICS IN STUDIES OF USABILITY AND USER EXPERIENCE

Article

Oct 2009

Many concepts and methodologies exist today that are based on the idea that products and technical systems used in both work and everyday life need to satisfy a number of criteria if they are to be accepted and used under normal conditions. These new concepts, particularly those related to user experience, bring into question the role played by human factors and ergonomics in the design of products, as well as the contributions they can make. Also brought into question are the concepts and methods that are borrowed from other disciplines, and the development of methodological tools used. This paper aims to examine such issues, together with any developments.

Automated parameters retrieval for energetic model identification of servers in datacenters

Conference Paper

Jun 2013

In this paper, a method is proposed to automatically identify or retrieve parameters of an energetic model of servers in Internet DataCenters (IDC). Using historical measurements of power consumption and Informatics Technology (IT) load, the parameters of server model are identified as defined in previous publications. This identification is used to introduce the correct model of server in a global optimal energy management system of IDC. This work is part of a bigger three years project, called EnergeTIC, which aim is to optimize the energy consumption of a global IDC, taking into account all its components, from servers to their cooling systems.

Multidimensional Scaling

Chapter

Jan 2008

Suppose dissimilarity data have been collected on a set of n objects or individuals, where there is a value of dissimilarity measured for each pair.The dissimilarity measure used might be a subjective judgement made by a judge, where for example a teacher subjectively scores the strength of friendship between pairs of pupils in her class, or, as an alternative, more objective, measure, she might count the number of contacts made in a day between each pair of pupils. In other situations the dissimilarity measure might be based on a data matrix. The general aim of multidimensional scaling is to find a configuration of points in a space, usually Euclidean, where each point represents one of the objects or individuals, and the distances between pairs of points in the configuration match as well as possible the original dissimilarities between the pairs of objects or individuals. Such configurations can be found using metric and non-metric scaling, which are covered in Sects. 2 and 3. A number of other techniques are covered by the umbrella title of multidimensional scaling (MDS), and here the techniques of Procrustes analysis, unidimensional scaling, individual differences scaling, correspondence analysis and reciprocal averaging are briefly introduced and illustrated with pertinent data sets.

Behavior analysis and consumer psychology

Article

Feb 1994
J ECON PSYCHOL

Gordon Foxall

The paper examines the epistemological status of a comprehensive model of purchase and consumption derived from a critique of behavior analysis. Part 1 describes the provenance of the model's research program. Part 2 identifies the complexities with which a behavior analytical model of consumer choice contends: verbal behavior and marketing interventions in affluent consumer-orientated economies. Such complexity results in an interpretative account of consumer behavior. In Part 3 the Behavioral Perspective Model of purchase and consumption is described and refined. The model's components — the consumer's learning history, the consumer behavior setting, purchase and consumption responses, and their reinforcing and punishing consequences — are derived and described. Four operant classes of consumer behavior, defined by the environmental contingencies controlling them, are identified: Maintenance, Accumulation, Pleasure and Accomplishment. In Part 4, the model employs these classes in the interpretation of broad sequences of consumer choice. First, consumer behavior is described as a hierarchy of these opérants over the consumer life cycle, exemplified by reference to household saving and financial asset management; the opérant classification is then used to interpret consumer behavior as an evolutionary process, exemplified by the adoption and diffusion of innovations. Finally, the model is evaluated according to the criteria of description, delimitation, generation and integration.

User Behavior Analysis with Machine Learning Techniques in Cloud Computing Architectures

Abstract and Figures

Recommended publications

A Usage Control Based Architecture for Cloud Environments

TVPDc: A Model for Secure Managing Virtual Infrastructure in IaaS Cloud

A Transparent Service Replication Mechanism for Clouds

A new system architecture for crowd simulation