Fig 1 - uploaded by Michal Turcanik
Content may be subject to copyright.
Web users clustering algorithm.

Web users clustering algorithm.

Source publication
Conference Paper
Full-text available
Aim of this paper is to analyse possibility of cluster users of a selected network on the base of their browsing behaviour. Source of information is web access log file which consist of all the important information. Paper presents idea of pre-processing of information from a web access log file and it also presents K-means clustering algorithm use...

Contexts in source publication

Context 1
... discovered clusters can identify groups of users with same behaviour from different point of view. The algorithm of clustering of web user is shown in Fig 1. At first web access log is preprocessed to obtain required information. ...
Context 2
... discovered clusters can identify groups of users with same behaviour from different point of view. The algorithm of clustering of web user is shown in Fig 1. At first web access log is preprocessed to obtain required information. ...

Similar publications

Conference Paper
Full-text available
Aim of this paper is to analyse possibility of cluster users of a selected network on the base of their browsing behaviour. Source of information is web access log file which consist of all the important information. Paper presents idea of pre-processing of information from a web access log file and it also presents K-means clustering algorithm use...

Citations

... Methods such as selecting the initial centers based on data sparsity were proposed, which may effectively minimize algorithm iteration time and increase clustering quality. In [16] the researchers clustered users into networks according to their browsing behaviour.Their methodology consisted of web log pre-processing and clustering with the K-means algorithm with the ultimate goal to group users into different categories and analyze their behaviour based on the category of the web sites with which they interact. ...
... Log parsing is a technique for converting unstructured content from log messages into a format appropriate for data mining. The aim of the study [16] was to analyze how log parsing strategies using natural language processing affected log mining performance. The researchers utilized two datasets: the first consisted of log data collected in an aviation system which included over 4,500,000 messages gathered over the period of a year. ...
Preprint
Full-text available
Content delivery networks (CDNs) are the backbone of the Internet and are key in delivering high quality video on demand (VoD), web content and file services to billions of users. CDNs usually consist of hierarchically organized content servers positioned as close to the customers as possible. CDN operators face a significant challenge when analyzing billions of web server and proxy logs generated by their systems. The main objective of this study was to analyze the applicability of various clustering methods in CDN error log analysis. We worked with real-life CDN proxy logs, identified key features included in the logs (e.g., content type, HTTP status code, time-of-day, host) and clustered the log lines corresponding to different host types offering live TV, video on demand, file caching and web content. Our experiments were run on a dataset consisting of proxy logs collected over a 7-day period from a single, physical CDN server running multiple types of services (VoD, live TV, file). The dataset consisted of 2.2 billion log lines. Our analysis showed that CDN error clustering is a viable approach towards identifying recurring errors and improving overall quality of service.
... Methods such as selecting the initial centers based on data sparsity were proposed, which may effectively minimize algorithm iteration time and increase clustering quality. In [16] the researchers clustered users into networks according to their browsing behaviour.Their methodology consisted of web log pre-processing and clustering with the K-means algorithm with the ultimate goal to group users into different categories and analyze their behaviour based on the category of the web sites with which they interact. ...
... Log parsing is a technique for converting unstructured content from log messages into a format appropriate for data mining. The aim of the study [16] was to analyze how log parsing strategies using natural language processing affected log mining performance. The researchers utilized two datasets: the first consisted of log data collected in an aviation system which included over 4,500,000 messages gathered over the period of a year. ...
Conference Paper
Full-text available
Content delivery networks (CDNs) are the backbone of the Internet and are key in delivering high quality video on demand (VoD), web content and file services to billions of users. CDNs usually consist of hierarchically organized content servers positioned as close to the customers as possible. CDN operators face a significant challenge when analyzing billions of web server and proxy logs generated by their systems. The main objective of this study was to analyze the applicability of various clustering methods in CDN error log analysis. We worked with real-life CDN proxy logs, identified key features included in the logs (e.g., content type, HTTP status code, time-of-day, host) and clustered the log lines corresponding to different host types offering live TV, video on demand, file caching and web content. Our experiments were run on a dataset consisting of proxy logs collected over a 7-day period from a single, physical CDN server running multiple types of services (VoD, live TV, file). The dataset consisted of 2.2 billion log lines. Our analysis showed that CDN error clustering is a viable approach towards identifying recurring errors and improving overall quality of service.