Top 20 malware families of DREBIN dataset.

Source publication

Evaluation of Advanced Ensemble Learning Techniques for Android Malware Detection

Article

Full-text available

Feb 2020

Android is the most well-known portable working framework having billions of dynamic clients worldwide that pulled in promoters, programmers, and cybercriminals to create malware for different purposes. As of late, wide-running inquiries have been led on malware examination and identification for Android gadgets while Android has likewise actualize...

Context 1

... experiments, we also use this dataset that has a total of 123,453 sample data for Android applications and contains as many as 545,333 behavioral features, where 5,560 applications contain malware samples from 179 di®erent malware families and 5,560 are benign samples. The samples were collected in the period of August 2010-October 2012; the top 20 families of malware are listed in Table 1. ...

View in full-text

Figure 4. Top 20 important features with its feature importance score

The result of classification using different dataset of features

Exploring permissions in android applications using ensemble-based extra tree feature selection

Article

Full-text available

Jul 2020

span>The fast development of mobile apps and its usage has led to increase the risk of exploiting user privacy. One method used in Android security mechanism is permission control that restricts the access of apps to core facilities of devices. However, that permissions could be exploited by attackers when granting certain combinations of permissio...

A Vast Review of Recognizing the Presence of Android Malware Based on Ensemble Machine Learning Technique

Article

Full-text available

Jan 2024

SSCL-TransMD: Semi-Supervised Continual Learning Transformer for Malicious Software Detection

Article

Full-text available

Nov 2023

Machine learning-based malware (malicious software) detection methods have a wide range of real-world applications. However, these types of approaches suffer from the fatal problem of “model aging”, in which the validity of the model decreases rapidly as the malware continues to evolve and variants emerge continuously. The model aging problem is usually solved by model retraining, which relies on lots of labeled samples obtained at great expense. To address this challenge, this paper proposes a semi-supervised continuous learning malware detection model based on Transformer. Firstly, this model improves the lifelong semi-supervised mixture algorithm to dynamically adjust the weighted combination of new sample sequences and historical ones to solve the imbalance problem. Secondly, the Learning with Local and Global Consistency algorithm is used to iteratively compute similarity scores for the unlabeled samples in the mixed samples to obtain pseudo-labels. Lastly, the Multilayer Perceptron is applied for malware classification. To validate the effectiveness of the model, this paper conducts experiments on the CICMalDroid2020 dataset. The experimental results show that the proposed model performs better than existing deep learning detection models. The F1 score has an average improvement of 1.27% compared to other models when conducting binary classification. And, after inputting hybrid samples, including historical data and new data, four times, the F1 score is still 1.96% higher than other models.

MalHyStack: A Hybrid Stacked Ensemble Learning Framework with Feature Engineering Schemes for Obfuscated Malware Analysis

Article

Full-text available

Oct 2023

Since the advent of malware, it has reached a toll in this world that exchanges billions of data daily. Millions of people are victims of it, and the numbers are not decreasing as the year goes by. Malware is of various types in which obfuscation is a special kind. Obfuscated malware detection is necessary as it is not usually detectable and is prevalent in the real world. Although numerous works have already been done in this field so far, most of these works still need to catch up at some points, considering the scope of exploration through recent extensions. In addition to that, the application of a hybrid classification model is yet to be popularized in this field. Thus, in this paper, a novel hybrid classification model named, MalHyStack, has been proposed for detecting such obfuscated malware within the network. This proposed working model is built incorporating a stacked ensemble learning scheme, where conventional machine learning algorithms namely, Extremely Randomized Trees Classifier (ExtraTrees), Extreme Gradient Boosting (XgBoost) Classifier, and Random Forest are used in the first layer which is then followed by a deep learning layer in the second stage. Before utilizing the classification model for malware detection, an optimum subset of features has been selected using Pearson correlation analysis which improved the accuracy of the model by more than 2 % for multiclass classification. It also reduces time complexity by approximately two and three times for binary and multiclass classification, respectively. For evaluating the performance of the proposed model, a recently published balanced dataset named CIC-MalMem-2022 has been used. Utilizing this dataset, the overall experimental results of the proposed model represent a superior performance when compared to the existing classification models.

An Ensemble-Based Parallel Deep Learning Classifier with PSO-BP Optimization for Malware Detection

Article

Full-text available

Jul 2023

Digital networks and systems are susceptible to malicious software (malware) attacks. Deep learning (DL) models have recently emerged as effective methods to classify and detect malware. However, DL models often relies on gradient descent optimization in learning, i.e., the Back-Propagation (BP) algorithm; therefore, their training and optimization procedures suffer from several limitations, such as high computational cost and local suboptimal solutions. On the other hand, ensemble methods overcome the shortcomings of individual models by consolidating their strengths to increase performance. In this paper, we propose an ensemble-based parallel DL classifier for malware detection. In particular, a stacked ensemble learning method is developed, which leverages five DL base models and a neural network as a meta model. The DL models are trained and optimized with a hybrid optimization method based on BP and Particle Swarm Optimization (PSO) algorithms. To improve scalability and efficiency of the ensemble method, a parallel computing framework is exploited. The proposed ensemble method is evaluated using five malware datasets (namely, Drebin, NTAM, TOP-PE, DikeDataset, and ML_Android), and high accuracy rates of 99.2%, 99.3%, 98.7%, 100%, and 100% have been achieved, respectively. Its parallel implementation also significantly enhances the computational speed by a factor up to 6.75 times. These results ascertain that the proposed ensemble method is effective, efficient, and scalable, outperforming many other compared methods in malware detection. INDEX TERMS Ensemble method, malware detection, deep learning, parallel processing, backpropagation algorithm, particle swarm optimization.

An ensemble deep learning classifier stacked with fuzzy ARTMAP for malware detection

Article

Full-text available

Apr 2023
J INTELL FUZZY SYST

Malicious software, or malware, has posed serious and evolving security threats to Internet users. Many anti-malware software packages and tools have been developed to protect legitimate users from these threats. However, legacy anti-malware methods are confronted with millions of potential malicious programs. To combat these threats, intelligent anti-malware systems utilizing machine learning (ML) models are useful. However, most ML models have limitations in performance since the training depth is usually limited. The emergence of Deep Learning (DL) models allow more training possibilities and improvement in performance. DL models often use gradient descent optimization, i.e., the Back-Propagation (BP) algorithm; therefore, their training and optimization procedures suffer from local sub-optimal solutions. In addition, DL-based malware detection methods often entail single classifiers. Ensemble learning overcomes the shortcomings of individual techniques by consolidating their strengths to improve the performance. In this paper, we propose an ensemble DL classifier stacked with the Fuzzy ARTMAP (FAM) model for malware detection. The stacked ensemble method uses several heterogeneous deep neural networks as the base learners. During the training and optimization process, these base learners adopt a hybrid BP and Particle Swarm Optimization algorithm to combine both local and global optimization capabilities for identifying optimal features and improving the classification performance. FAM is selected as a meta-learner to effectively train and combine the outputs of the base learners and achieve robust and accurate classification. A series of empirical studies with different benchmark data sets is conducted. The results ascertain that the proposed ensemble method is effective and efficient, outperforming many other compared methods.

An Improved Binary Owl Feature Selection in the Context of Android Malware Detection

Article

Full-text available

Nov 2022

Recently, the proliferation of smartphones, tablets, and smartwatches has raised security concerns from researchers. Android-based mobile devices are considered a dominant operating system. The open-source nature of this platform makes it a good target for malware attacks that result in both data exfiltration and property loss. To handle the security issues of mobile malware attacks, researchers proposed novel algorithms and detection approaches. However, there is no standard dataset used by researchers to make a fair evaluation. Most of the research datasets were collected from the Play Store or collected randomly from public datasets such as the DREBIN dataset. In this paper, a wrapper-based approach for Android malware detection has been proposed. The proposed wrapper consists of a newly modified binary Owl optimizer and a random forest classifier. The proposed approach was evaluated using standard data splits given by the DREBIN dataset in terms of accuracy, precision, recall, false-positive rate, and F1-score. The proposed approach reaches 98.84% and 86.34% for accuracy and F-score, respectively. Furthermore, it outperforms several related approaches from the literature in terms of accuracy, precision, and recall.

Android Malware Classification Using Optimized Ensemble Learning Based on Genetic Algorithms

Article

Full-text available

Nov 2022

The continuous increase in Android malware applications (apps) represents a significant danger to the privacy and security of users’ information. Therefore, effective and efficient Android malware app-classification techniques are needed. This paper presents a method for Android malware classification using optimized ensemble learning based on genetic algorithms. The suggested method is divided into two steps. First, a base learner is used to handle various machine learning algorithms, including support vector machine (SVM), logistic regression (LR), gradient boosting (GB), decision tree (DT), and AdaBoost (ADA) classifiers. Second, a meta learner RF-GA, utilizing genetic algorithm (GA) to optimize the parameters of a random forest (RF) algorithm, is employed to classify the prediction probabilities from the base learner. The genetic algorithm is used to optimize the parameter settings in the RF algorithm in order to obtain the highest Android malware classification accuracy. The effectiveness of the proposed method was examined on a dataset consisting of 5560 Android malware apps and 9476 goodware apps. The experimental results demonstrate that the suggested ensemble-learning strategy for classifying Android malware apps, which is based on an optimized random forest using genetic algorithms, outperformed the other methods and achieved the highest accuracy (94.15%), precision (94.15%), and area under the curve (AUC) (98.10%).

Performance Analysis of Machine Learning Methods with Class Imbalance Problem in Android Malware Detection

Article

Full-text available

May 2022

Due to the exponential rise of mobile technology, a slew of new mobile security concerns has surfaced recently. To address the hazards connected with malware, many approaches have been developed. Signature-based detection is the most widely used approach for detecting Android malware. This approach has the disadvantage of being unable to identify unknown malware. As a result of this issue, machine learning (ML) for detecting malware apps was created. Conventional ML methods are concerned with increasing classification accuracy. However, the standard classification method performs poorly in recognizing malware applications due to the unbalanced real-world datasets. In this study, an empirical analysis of the detection performance of ML methods in the presence of class imbalance is conducted. Specifically, eleven (11) ML methods with diverse computational complexities were investigated. Also, the synthetic minority oversampling technique (SMOTE) and random undersampling (RUS) are deployed to address the class imbalance in the Android malware datasets. The experimented ML methods are tested using the Malgenome and Drebin Android malware datasets that contain features gathered from both static and dynamic malware approaches. According to the experimental findings, the performance of each experimented ML method varies across the datasets. Moreover, the presence of class imbalance deteriorated the performance of the ML methods as their performances were amplified with the deployment of data sampling methods (SMOTE and RUS) used to alleviate the class imbalance problem. Besides, ML models with SMOTE technique are superior to ML models based on the RUS method. It is therefore recommended to address the inherent class imbalance problem in Android Malware detection.

Empirical Analysis of Forest Penalizing Attribute and Its Enhanced Variations for Android Malware Detection

Article

Full-text available

May 2022

As a result of the rapid advancement of mobile and internet technology, a plethora of new mobile security risks has recently emerged. Many techniques have been developed to address the risks associated with Android malware. The most extensively used method for identifying Android malware is signature-based detection. The drawback of this method, however, is that it is unable to detect unknown malware. As a consequence of this problem, machine learning (ML) methods for detecting and classifying malware applications were developed. The goal of conventional ML approaches is to improve classification accuracy. However, owing to imbalanced real-world datasets, the traditional classification algorithms perform poorly in detecting malicious apps. As a result, in this study, we developed a meta-learning approach based on the forest penalizing attribute (FPA) classification algorithm for detecting malware applications. In other words, with this research, we investigated how to improve Android malware detection by applying empirical analysis of FPA and its enhanced variants (Cas_FPA and RoF_FPA). The proposed FPA and its enhanced variants were tested using the Malgenome and Drebin Android malware datasets, which contain features gathered from both static and dynamic Android malware analysis. Furthermore, the findings obtained using the proposed technique were compared with baseline classifiers and existing malware detection methods to validate their effectiveness in detecting malware application families. Based on the findings, FPA outperforms the baseline classifiers and existing ML-based Android malware detection models in dealing with the unbalanced family categorization of Android malware apps, with an accuracy of 98.94% and an area under curve (AUC) value of 0.999. Hence, further development and deployment of FPA-based meta-learners for Android malware detection and other cyber-security threats is recommended.

MFDroid: A Stacking Ensemble Learning Framework for Android Malware Detection

Article

Full-text available

Mar 2022
SENSORS-BASEL

As Android is a popular a mobile operating system, Android malware is on the rise, which poses a great threat to user privacy and security. Considering the poor detection effects of the single feature selection algorithm and the low detection efficiency of traditional machine learning methods, we propose an Android malware detection framework based on stacking ensemble learning—MFDroid—to identify Android malware. In this paper, we used seven feature selection algorithms to select permissions, API calls, and opcodes, and then merged the results of each feature selection algorithm to obtain a new feature set. Subsequently, we used this to train the base learner, and set the logical regression as a meta-classifier, to learn the implicit information from the output of base learners and obtain the classification results. After the evaluation, the F1-score of MFDroid reached 96.0%. Finally, we analyzed each type of feature to identify the differences between malicious and benign applications. At the end of this paper, we present some general conclusions. In recent years, malicious applications and benign applications have been similar in terms of permission requests. In other words, the model of training, only with permission, can no longer effectively or efficiently distinguish malicious applications from benign applications.

Top 20 malware families of DREBIN dataset.

Context in source publication

Similar publications

Citations