Diagram of the Continuous Training Traffic Classification system based on TIE

Source publication

K-Dimensional Trees for Continuous Traffic Classification

Conference Paper

Full-text available

Apr 2010

The network measurement community has proposed multiple machine learning (ML) methods for traffic classification during the last years. Although several research works have reported accuracies over 90%, most network operators still use either obsolete (e.g., port-based) or extremely expensive (e.g., pattern matching) methods for traffic classificat...

Context 1

... this section, we show the interaction of our KD-Tree plugin with the rest of the TIE architecture, and describe the modifications done in TIE to allow our plugin to continuously retrain itself. Figure 1 shows the data flow of our continuous training system based on TIE. The first three modules are used without any modification as found in the original version of TIE. ...

View in full-text

Correlation analysis of performance metrics for classifier

Conference Paper

Full-text available

Sep 2014

The correct selection of performance metrics is one of the most key issues in evaluating classifier's performance. Although many performance metrics have been proposed and used in machine learning community, there is not any common conclusions among practitioners regarding which metric to choose for evaluating a classifier's performance. In this pa...

A Multi-dimensional Genetic Programming Approach for Multi-class Classification Problems

Conference Paper

Full-text available

Apr 2014

Classification problems are of profound interest for the machine learning community as well as to an array of application fields. However, multi-class classification problems can be very complex, in particular when the number of classes is high. Although very successful in so many applications, GP was never regarded as a good method to perform mult...

Distribution-Based Categorization of Classifier Transfer Learning

Article

Full-text available

Dec 2017

Transfer Learning (TL) aims to transfer knowledge acquired in one problem, the source problem, onto another problem, the target problem, dispensing with the bottom-up construction of the target model. Due to its relevance, TL has gained significant interest in the Machine Learning community since it paves the way to devise intelligent learning mode...

Combining Probabilistic Classifiers for Text Classification

Article

Full-text available

Aug 2014

Probabilistic classifiers are considered to be among the most popular classifiers for the machine learning community and are used in many applications. Although popular probabilistic classifiers exhibit very good performance when used individually in a specific classification task, very little work has been done on assessing the performance of two...

The Application of Deep Learning for Network Traffic Classification

Article

Full-text available

Apr 2023

Jingran Yang

The classification, detection, and analysis of routine network traffic has been a hot topic for businesses and research institutions due to the proliferation of Internet of Things devices and the explosive development of networks. Traditional methods for categorizing network traffic primarily employ common machine learning algorithms e.g., decision trees and plain Bayes algorithms, but as deep learning technology advances, more and more traffic classifications are being successfully applied. This study examines existing deep learning-based network traffic classification techniques and focuses on the categorization of computer network traffic. Firstly, the research background of the topic is introduced, and then the traffic classification based on deep learning is mainly described, which includes traffic classification based on Stacked Autoencoder, traffic classification based on Convolutional Neural Network and traffic classification based on Recurrent Neural Networks. Following investigation, this paper comes to the conclusion that Long Short-Term Memory and Convolutional Neural Network models are the best deep learning models for traffic classification, with three-dimensional Convolutional Neural Network outperforming the others.

DISTILLER: Encrypted Traffic Classification via Multimodal Multitask Deep Learning

Article

Full-text available

Jan 2021
J NETW COMPUT APPL

Traffic classification, i.e. the inference of applications and/or services from their network traffic, represents the workhorse for service management and the enabler for valuable profiling information. The growing trend toward encrypted protocols and the fast-evolving nature of network traffic are obsoleting the traffic-classification design solutions based on payload-inspection or machine learning. Conversely, deep learning is currently foreseen as a viable means to design traffic classifiers based on automatically-extracted features. These reflect the complex patterns distilled from the multifaceted (encrypted) traffic, that implicitly carries information in "multimodal" fashion, and can be also used in application scenarios with diversified network visibility for (simul-taneously) tackling multiple classification tasks. To this end, in this paper a novel multimodal multitask deep learning approach for traffic classification is proposed, leading to the Distiller classifier. The latter is able to capitalize traffic-data heterogeneity (by learning both intra-and inter-modality dependencies), overcome performance limitations of existing (myopic) single-modal deep learning-based traffic classification proposals, and simultaneously solve different traffic categorization problems associated to different providers' desiderata. Based on a public dataset of encrypted traffic, we evaluate Distiller in a fair comparison with state-of-the-art deep learning architectures proposed for encrypted traffic classification (and based on single-modality philosophy). Results show the gains of our proposal over both multitask extensions of single-task baselines and native multitask architectures.

Toward Effective Mobile Encrypted Traffic Classification through Deep Learning

Article

Full-text available

May 2020
NEUROCOMPUTING

Traffic Classification (TC), consisting in how to infer applications generating network traffic, is currently the enabler for valuable profiling information, other than being the workhorse for service differentiation/blocking. Further, TC is fostered by the blooming of mobile (mostly encrypted) traffic volumes, fueled by the huge adoption of hand-held devices. While researchers and network operators still rely on machine learning to pursue accurate inference, we envision Deep Learning (DL) paradigm as the stepping stone toward the design of practical (and effective) mobile traffic classifiers based on automatically-extracted features, able to operate with encrypted traffic, and reflecting complex traffic patterns. In this context, the paper contribution is four-fold. First, it provides a taxonomy of the key network traffic analysis subjects where DL is foreseen as attractive. Secondly, it delves into the non-trivial adoption of DL to mobile TC, surfacing potential gains. Thirdly, to capitalize such gains, it proposes and validates a general framework for DL-based encrypted TC. Two concrete instances originating from our framework are then experimentally evaluated on three mobile datasets of human users' activity. Lastly, our framework is leveraged to point to future research perspectives.

Improved KNN Algorithm for Fine-Grained Classification of Encrypted Network Flow

Article

Full-text available

Feb 2020

The fine-grained classification of encrypted traffic is important for network security analysis. Malicious attacks are usually encrypted and simulated as normal application or content traffic. Supervised machine learning methods are widely used for traffic classification and show good performances. However, they need a large amount of labeled data to train a model, while labeled data is hard to obtain. Aiming at solving this problem, this paper proposes a method to train a model based on the K-nearest neighbor (KNN) algorithm, which only needs a small amount of data. Due to the fact that the importance of different traffic features varies, and traditional KNN does not highlight the importance of different features, this study introduces the concept of feature weight and proposes the weighted feature KNN (WKNN) algorithm. Furthermore, to obtain the optimal feature set and the corresponding feature weight set, a feature selection and feature weight self-adaptive algorithm for WKNN is proposed. In addition, a three-layer classification framework for encrypted network flows is established. Based on the improved KNN and the framework, this study finally presents a method for fine-grained classification of encrypted network flows, which can identify the encryption status, application type and content type of encrypted network flows with high accuracies of 99.3%, 92.4%, and 97.0%, respectively.

MIRAGE: Mobile-app Traffic Capture and Ground-truth Creation

Conference Paper

Full-text available

Oct 2019

Network traffic analysis, i.e. the umbrella of procedures for distilling information from network traffic, represents the enabler for highly-valuable profiling information, other than being the workhorse for several key network management tasks. While it is currently being revolutionized in its nature by the rising share of traffic generated by mobile and hand-held devices, existing design solutions are mainly evaluated on private traffic traces, and only a few public datasets are available, thus clearly limiting repeatability and further advances on the topic. To this end, this paper introduces and describes MIRAGE, a reproducible architecture for mobile-app traffic capture and ground-truth creation. The outcome of this system is MIRAGE-2019, a human-generated dataset for mobile traffic analysis (with associated ground-truth) having the goal of advancing the state-of-the-art in mobile app traffic analysis. A first statistical characterization of the mobile-app traffic in the dataset is provided in this paper. Still, MIRAGE is expected to be capitalized by the networking community for different tasks related to mobile traffic analysis.

Flexible Prediction of CT Images From MRI Data Through Improved Neighborhood Anchored Regression for PET Attenuation Correction

Article

Full-text available

Jul 2019

Given the complicated relationship between the Magnetic Resonance Imaging (MRI) signals and the attenuation values, the attenuation correction in hybrid Positron Emission Tomography (PET)/MRI systems remains a challenging task. Currently, existing methods are either time-consuming or require sufficient samples to train the models. In this work, an efficient approach for predicting pseudo computed tomography (CT) images from T1- and T2-weighted MRI data with limited data is proposed. The proposed approach uses improved neighborhood anchor regression (INAR) as a baseline method to pre-calculate projected matrices to flexibly predict the pseudo CT patches. Techniques, including the augmentation of the MR/CT dataset, learning of the nonlinear descriptors of MR images, hierarchical search for nearest neighbors, data-driven optimization, and multi-regressor ensemble, are adopted to improve the effectiveness of the proposed approach. In total, 22 healthy subjects were enrolled in the study. The pseudo CT images obtained using INAR with multi-regressor ensemble yielded mean absolute error (MAE) of 92.73 $\pm$ 14.86 HU, peak signal-to-noise ratio of 29.77 $\pm$ 1.63 dB, Pearson linear correlation coefficient of 0.82 $\pm$ 0.05, dice similarity coefficient of 0.81 $\pm$ 0.03, and the relative mean absolute error (rMAE) in PET attenuation correction of 1.30 $\pm$ 0.20% compared with true CT images. Moreover, our proposed INAR method, without any refinement strategies, can achieve considerable results with only seven subjects (MAE 106.89 $\pm$ 14.43, rMAE 1.51 $\pm$ 0.21%). The experiments prove the superior performance of the proposed method over the six innovative methods. Moreover, the proposed method can rapidly generate the pseudo CT images that are suitable for PET attenuation correction.

Mobile Encrypted Traffic Classification Using Deep Learning: Experimental Evaluation, Lessons Learned, and Challenges

Article

Full-text available

Feb 2019

The massive adoption of hand-held devices has led to the explosion of mobile traffic volumes traversing home and enterprise networks, as well as the Internet. Traffic Classification (TC), i.e. the set of procedures for inferring (mobile) applications generating such traffic, has become nowadays the enabler for highly-valuable profiling information (with certain privacy downsides), other than being the workhorse for service differ-entiation/blocking. Nonetheless, the design of accurate classifiers is exacerbated by the raising adoption of encrypted protocols (such as TLS), hindering the suitability of (effective) deep packet inspection approaches. Also, the fast-expanding set of apps and the moving-target nature of mobile traffic makes design solutions with usual machine learning, based on manually-and expert-originated features, outdated and unable to keep the pace. For these reasons Deep Learning (DL) is here proposed, for the first time, as a viable strategy to design practical mobile traffic classi-fiers based on automatically-extracted features, able to cope with encrypted traffic, and reflecting their complex traffic patterns. To this end, different state-of-the-art DL techniques from (standard) TC are here reproduced, dissected (highlighting critical choices), and set into a systematic framework for comparison, including also a performance evaluation workbench. The latter outcome, although declined in the mobile context, has the applicability appeal to the wider umbrella of encrypted TC tasks. Finally, the performance of these DL classifiers is critically investigated based on an exhaustive experimental validation (based on three mobile datasets of real human users' activity), highlighting the related pitfalls, design guidelines, and challenges.

Speeding-Up DPI Traffic Classification with Chaining

Conference Paper

Dec 2018

Detecting and diagnosing anomalies in cellular networks using Random Neural Networks

Conference Paper

Sep 2016

On Internet Traffic Classification: A Two-Phased Machine Learning Approach

Article

Full-text available

Jan 2016

Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through k -means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to k -means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.

Diagram of the Continuous Training Traffic Classification system based on TIE

Context in source publication

Similar publications

Citations