ArticlePublisher preview available

An empirical approach towards characterization of encrypted and unencrypted VoIP traffic

January 2020
Multimedia Tools and Applications 79(1):1-29

January 2020
79(1):1-29

DOI:10.1007/s11042-019-08088-w

Authors:

Paromita Choudhury

Defence Research and Development Organisation

Sukumar Nandi

Indian Institute of Technology Guwahati

Athithan Gopalasamy

Defence Research and Development Organisation

VoIP traffic classification plays a major role towards network policy enforcements. Characterization of VoIP media traffic is based on codec behaviour. With the introduction of variable bit rate codecs, coding, compression and encryption present different complexities with respect to the classification of VoIP traffic. The randomness tests do not extend directly to classification of compressed and encrypted VoIP traffic. The paper examines the applicability of randomness tests to encrypted and unencrypted VoIP traffic with constant bit rate and variable bit rate codecs. A novel method Construction-by-Selection that constructs a test sequence from partial payload data of VoIP media session is proposed in this paper. The results based on experimentations on this method show that such construction exhibit randomness and hence allows differentiation of encrypted VoIP media traffic from unencrypted VoIP media traffic even in the case of variable bit rate codecs.

Input sequence generation

…

NIST test pass rate for encrypted (E) and unencrypted (U) media sessions on removal of a one-fourth, b one-third, c half, d two-third and e three-fourth of the payload data from three different positions

…

Plot of entropy and chi-square values of VBR coded encrypted (E) and unencrypted (U) sessions with input sequence size of a 7500 bytes on removal of one-fourth data and b 6666 bytes on removal of one-third data from three different positions of each payload

…

Plot of entropy and chi-square values of VBR coded encrypted (E) and unencrypted (U) sessions with input sequence size of a 5000 bytes on removal of half of the data and b 3333 bytes on removal of two-third of the data from the middle and end of each payload

…

Plot of number of sequences for various entropy and chi-square thresholds of VBR coded encrypted (E) and unencrypted (U) sessions with input sequence size of 2500 bytes on removal of three fourth data from the a middle and b end of each payload

…

Figures - available from: Multimedia Tools and Applications

This content is subject to copyright. Terms and conditions apply.

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Multimedia Tools and Applications

This content is subject to copyright. Terms and conditions apply.

An empirical approach towards characterization

of encrypted and unencrypted VoIP traffic

Paromita Choudhury

&K. R. Prasanna Kumar

&Sukumar Nandi

&G. Athithan

Received: 9 August 2018 /Revised: 11 June 2019 /Accepted: 6 August 2019

#Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract

VoIP traffic classification plays a major role towards network policy enforcements. Charac-

terization of VoIP media traffic is based on codec behaviour. With the introduction of variable

bit rate codecs, coding, compression and encryption present different complexities with respect

to the classification of VoIP traffic. The randomness tests do not extend directly to classifica-

tion of compressed and encrypted VoIP traffic. The paper examines the applicability of

randomness tests to encrypted and unencrypted VoIP traffic with constant bit rate and variable

bit rate codecs. A novel method Construction-by-Selection that constructs a test sequence from

partial payload data of VoIP media session is proposed in this paper. The results based on

experimentations on this method show that such construction exhibit randomness and hence

allows differentiation of encrypted VoIP media traffic from unencrypted VoIP media traffic

even in the case of variable bit rate codecs.

Keywords VoI P .Codec .Encryption .Compression .Hamming distance .Auto-correlation .

Randomness test

https://doi.org/10.1007/s11042-019-08088-w

*Paromita Choudhury

paromitaz@gmail.com

K. R. Prasanna Kumar

prasanna@cair.drdo.in

Sukumar Nandi

sukumar@iitg.ernet.in

G. Athithan

athithan.g@gmail.com

CAIR, DRDO, Bangalore, India

Department of CSE, IIT-Guwahati, Guwahati, India

DRDO HQ, Delhi, India

Multimedia Tools and Applications (2020) 79:603–631

Published online: 4 2019

September

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Comparison of Entropy Calculation Methods for Ransomware Encrypted File Identification

Preprint

Full-text available

Oct 2022

Ransomware is a malicious class of software that utilises encryption to implement an attack on system availability. The target's data remains encrypted and is held captive by the attacker until a ransom demand is met. A common approach used by many crypto-ransomware detection techniques is to monitor file system activity and attempt to identify encrypted files being written to disk, often using a file's entropy as an indicator of encryption. However, often in the description of these techniques, little or no discussion is made as to why a particular entropy calculation technique is selected or any justification given as to why one technique is selected over the alternatives. The Shannon method of entropy calculation is the most commonly-used technique when it comes to file encryption identification in crypto-ransomware detection techniques. Overall, correctly encrypted data should be indistinguishable from random data, so apart from the standard mathematical entropy calculations such as Chi-Square, Shannon Entropy and Serial Correlation, the test suites used to validate the output from pseudo-random number generators would also be suited to perform this analysis. he hypothesis being that there is a fundamental difference between different entropy methods and that the best methods may be used to better detect ransomware encrypted files. The paper compares the accuracy of 53 distinct tests in being able to differentiate between encrypted data and other file types. The testing is broken down into two phases, the first phase is used to identify potential candidate tests, and a second phase where these candidates are thoroughly evaluated. To ensure that the tests were sufficiently robust, the NapierOne dataset is used. This dataset contains thousands of examples of the most commonly used file types, as well as examples of files that have been encrypted by crypto-ransomware.

Comparison of Entropy Calculation Methods for Ransomware Encrypted File Identification

Article

Full-text available

Oct 2022
Entropy

Ransomware is a malicious class of software that utilises encryption to implement an attack on system availability. The target’s data remains encrypted and is held captive by the attacker until a ransom demand is met. A common approach used by many crypto-ransomware detection techniques is to monitor file system activity and attempt to identify encrypted files being written to disk, often using a file’s entropy as an indicator of encryption. However, often in the description of these techniques, little or no discussion is made as to why a particular entropy calculation technique is selected or any justification given as to why one technique is selected over the alternatives. The Shannon method of entropy calculation is the most commonly-used technique when it comes to file encryption identification in crypto-ransomware detection techniques. Overall, correctly encrypted data should be indistinguishable from random data, so apart from the standard mathematical entropy calculations such as Chi-Square (χ2), Shannon Entropy and Serial Correlation, the test suites used to validate the output from pseudo-random number generators would also be suited to perform this analysis. The hypothesis being that there is a fundamental difference between different entropy methods and that the best methods may be used to better detect ransomware encrypted files. The paper compares the accuracy of 53 distinct tests in being able to differentiate between encrypted data and other file types. The testing is broken down into two phases, the first phase is used to identify potential candidate tests, and a second phase where these candidates are thoroughly evaluated. To ensure that the tests were sufficiently robust, the NapierOne dataset is used. This dataset contains thousands of examples of the most commonly used file types, as well as examples of files that have been encrypted by crypto-ransomware. During the second phase of testing, 11 candidate entropy calculation techniques were tested against more than 270,000 individual files—resulting in nearly three million separate calculations. The overall accuracy of each of the individual test’s ability to differentiate between files encrypted using crypto-ransomware and other file types is then evaluated and each test is compared using this metric in an attempt to identify the entropy method most suited for encrypted file identification. An investigation was also undertaken to determine if a hybrid approach, where the results of multiple tests are combined, to discover if an improvement in accuracy could be achieved.

Majority Voting Ransomware Detection System

Article

Full-text available

Aug 2023

Crypto-ransomware remains a significant threat to governments and companies alike, with high-profile cyber security incidents regularly making headlines. Many different detection systems have been proposed as solutions to the ever-changing dynamic landscape of ransomware detection. In the majority of cases, these described systems propose a method based on the result of a single test performed on either the executable code, the process under investigation, its behaviour, or its output. In a small subset of ransomware detection systems, the concept of a scorecard is employed where multiple tests are performed on various aspects of a process under investigation and their results are then analysed using machine learning. The purpose of this paper is to propose a new majority voting approach to ransomware detection by developing a method that uses a cumulative score derived from discrete tests based on calculations using algorithmic rather than heuristic techniques. The paper describes 23 candidate tests, as well as 9 Windows API tests which are validated to determine both their accuracy and viability for use within a ransomware detection system. Using a cumulative score calculation approach to ransomware detection has several benefits, such as the immunity to the occasional inaccuracy of individual tests when making its final classification. The system can also leverage multiple tests that can be both comprehensive and complimentary in an attempt to achieve a broader, deeper, and more robust analysis of the program under investigation. Additionally, the use of multiple collaborative tests also significantly hinders ransomware from masking or modifying its behaviour in an attempt to bypass detection. The results achieved by this research demonstrate that many of the proposed tests achieved a high degree of accuracy in differentiating between benign and malicious targets and suggestions are offered as to how these tests, and combinations of tests, could be adapted to further improve the detection accuracy.

Majority Voting Approach to Ransomware Detection

Preprint

Full-text available

May 2023

An encrypted traffic classification neural network optimized by heuristic algorithm

Conference Paper

Dec 2023

High-speed encrypted traffic classification by using payload features

Article

Feb 2024

A Novel Classification Method Based on Particle Swarm Optimization

Conference Paper

May 2023

Combining Raw Data and Engineered Features for Optimizing Encrypted and Compressed Internet of Things Traffic Classification

Article

May 2023
COMPUT SECUR

Reliable Network-Packet Binary Classification

Chapter

Mar 2023

A network packet identification and classification is a fundamental requirement of network management to maintain the quality of service, quality of experience, efficient bandwidth utilization, etc. This becomes increasingly significant in light of the Internet’s and online applications’ rapid expansion. With the advent of secure applications, more and more encrypted traffic is proliferated on the internet. Specifically, peer-to-peer applications with user-defined protocols severely affect network management. So there is necessary to identify and classify encrypted traffic in a network. To overcome this, our proposed approach network-packet binary classification is implemented to classify the network traffic as encrypted or compressed packets, with better classification accuracy and with the usage of a limited amount of classification time. To achieve this, our model uses a Decision tree classifier with one of the efficient feature selection methods, Autoencoder. Our experimental results show that our model outperforms the most state-of-the-art methods in terms of classification accuracy. Our model achieved 100% classification accuracy within 0.009 s of processing time.

Comparison Of Common Mathematical Techniques Used In The Calculation Of File Entropy

Conference Paper

Full-text available

Nov 2022

HEDGE: Efficient Traffic Classification of Encrypted and Compressed Packets

Article

Full-text available

Apr 2019

As the size and source of network traffic increase, so does the challenge of monitoring and analysing network traffic. Therefore, sampling algorithms are often used to alleviate these scalability issues. However, the use of high entropy data streams, through the use of either encryption or compression, further compounds the challenge as current state of the art algorithms cannot accurately and efficiently differentiate between encrypted and compressed packets. In this work, we propose a novel traffic classification method named HEDGE (High Entropy DistinGuishEr) to distinguish between compressed and encrypted traffic. HEDGE is based on the evaluation of the randomness of the data streams and can be applied to individual packets without the need to have access to the entire stream. Findings from the evaluation show that our approach outperforms current state of the art. We also make available our statistically sound dataset, based on known benchmarks, to the wider research community.

RTP: A Transport Protocol for Real-Time Applications

Article

Full-text available

Jul 2003

Internet Low Bit Rate Codec (iLBC)

Article

Full-text available

Jan 2004

Identification of VoIP Encrypted Traffic Using a Machine Learning Approach

Article

Full-text available

Jan 2015

We investigate the performance of three different machine learning algorithms, namely C5.0, AdaBoost and Genetic programming (GP), to generate robust classifiers for identifying VoIP encrypted traffic. To this end, a novel approach Alshammari and Zincir-Heywood, 2011 based on machine learning is employed to generate robust signatures for classifying VoIP encrypted traffic. We apply statistical calculation on network flows to extract a feature set without including payload information, and information based on the source and destination of ports number and IP addresses. Our results show that finding and employing the most suitable sampling and machine learning technique can improve the performance of classifying VoIP significantly.

Randomness in Digital Cryptography: A Survey

Article

Full-text available

Jan 2010

Digital cryptography relies greatly on randomness in providing the security requirements imposed by various information systems. Just as dif-ferent requirements call for specific cryptographic techniques, randomness takes upon a variety of roles in order to ensure the proper strength of these crypto-graphic primitives. The purpose of the survey is to emphasize the importance of randomness in digital cryptography by identifying and highlighting the many roles random number sequences play and to carry forth the significance of choos-ing and integrating suitable random number generators as security flaws in the generator can easily compromise the security of the whole system.

Classification of Encrypted Traffic With Second-Order Markov Chains and Application Attribute Bigrams

Article

Apr 2017

With a profusion of network applications, traffic classification plays a crucial role in network management and policy-based security control. The widely used encryption transmission protocols, such as the Secure Socket Layer/Transport Layer Security (SSL/TLS) protocols, lead to the failure of traditional payload-based classification methods. Existing methods for encrypted traffic classification cannot achieve high discrimination accuracy for applications with similar fingerprints. In this paper, we propose an attribute-aware encrypted traffic classification method based on the second-order Markov Chains. We start by exploring approaches that can further improve the performance of existing methods in terms of discrimination accuracy, and make promising observations that the application attribute bigram, which consists of the certificate packet length and the first application data size in SSL/TLS sessions, contributes to application discrimination. To increase the diversity of application fingerprints, we develop a new method by incorporating the attribute bigrams into the secondorder homogeneous Markov chains. Extensive evaluation results show that the proposed method can improve the classification accuracy by 29% on the average compared with the state-of-theart Markov-based method.

A survey of methods for encrypted traffic classification and analysis

Article

Jul 2015
Int J Netw Manag

With the widespread use of encrypted data transport, network traffic encryption is becoming a standard nowadays. This presents a challenge for traffic measurement, especially for analysis and anomaly detection methods, which are dependent on the type of network traffic. In this paper, we survey existing approaches for classification and analysis of encrypted traffic. First, we describe the most widespread encryption protocols used throughout the Internet. We show that the initiation of an encrypted connection and the protocol structure give away much information for encrypted traffic classification and analysis. Then, we survey payload and feature-based classification methods for encrypted traffic and categorize them using an established taxonomy. The advantage of some of described classification methods is the ability to recognize the encrypted application protocol in addition to the encryption protocol. Finally, we make a comprehensive comparison of the surveyed feature-based classification methods and present their weaknesses and strengths. Copyright © 2015 John Wiley & Sons, Ltd.

Towards real-time processing for application identification of encrypted traffic

Conference Paper

Feb 2014

Application identification in the middle is one of key challenges for network operators to manage application based traffic and policy controls in the Internet. However, it is becoming harder according to the increase of end-to-end encrypted traffic in which we hardly read application specific information from packets. We previously proposed a method to identify the application of traffic whenever the traffic is encrypted or not. Our method gives a significant accuracy of identification of encrypted traffic as high as the case when traffic is not encrypted, however, it requires an offline processing to obtain statistics of the whole of flows. A real-time identification is important, but the accuracy is a problem due to unstable information of flow statistics. In this paper we therefore propose an approach to improve the accuracy of identification when we identify the encrypted traffic in real-time. We first clarify the sufficient number of packets required for accurate identification, and then the method to infer the statistics to improve the accuracy even when the obtained number of packets is smaller than the one required. Experimental results have shown that the proposed approach achieves the high accuracy almost the same as in offline method.

Markov Chain Fingerprinting to Classify Encrypted Traffic

Conference Paper

Apr 2014

In this paper, we propose stochastic fingerprints for application traffic flows conveyed in Secure Socket Layer/Transport Layer Security (SSL/TLS) sessions. The fingerprints are based on first-order homogeneous Markov chains for which we identify the parameters from observed training application traces. As the fingerprint parameters of chosen applications considerably differ, the method results in a very good accuracy of application discrimination and provides a possibility of detecting abnormal SSL/TLS sessions. Our analysis of the results reveals that obtaining application discrimination mainly comes from incorrect implementation practice, the misuse of the SSL/TLS protocol, various server configurations, and the application nature.

Analysis of VBR coded VoIP for traffic classification

Conference Paper

Aug 2013

Classification of Voice over Internet Protocol (VoIP) traffic is important for network management operations. The media traffic, which carries the voice on Real-time Transport Protocol (RTP), is subjected to variation in transmitted packet sizes and content due to the usage of Variable Bit Rate (VBR) codecs. In the absence of session level information, the RTP header does not uniquely identify the VBR voice codecs defined as dynamic payload type. In this paper we present a method to classify VoIP traffic coded with three VBR codecs - iSAC, SILK and Speex. We first formulate features to characterize an RTP flow based on packet size and entropy values of the packet content. The features are used for classification of RTP traffic based on codec using machine learning techniques. The paper reports classification results using the three machine learning algorithms, namely 1-NN, C4.5 and Naive Bayes. The results show an accuracy of over 98% for offline classification with the reduced feature set. The paper also presents the performance of the classifiers with varying size of available traffic.

An empirical approach towards characterization of encrypted and unencrypted VoIP traffic

Abstract and Figures

Recommended publications

On the Cost of Worst-Case Coding Length Constraints

Directly diode-pumped millijoule subpicosecond Yb : glass regenerative amplifier

Surgical sepsis: Clinical, pathological, and anatomic aspects

Alternative approaches to trip distribution modelling: A retrospective review and suggestions for co...