Figure - uploaded by Md Shohel Rana
Content may be subject to copyright.
Top 20 malware families of DREBIN dataset.

Top 20 malware families of DREBIN dataset.

Source publication
Article
Full-text available
Android is the most well-known portable working framework having billions of dynamic clients worldwide that pulled in promoters, programmers, and cybercriminals to create malware for different purposes. As of late, wide-running inquiries have been led on malware examination and identification for Android gadgets while Android has likewise actualize...

Context in source publication

Context 1
... experiments, we also use this dataset that has a total of 123,453 sample data for Android applications and contains as many as 545,333 behavioral features, where 5,560 applications contain malware samples from 179 di®erent malware families and 5,560 are benign samples. The samples were collected in the period of August 2010-October 2012; the top 20 families of malware are listed in Table 1. ...

Similar publications

Article
Full-text available
span>The fast development of mobile apps and its usage has led to increase the risk of exploiting user privacy. One method used in Android security mechanism is permission control that restricts the access of apps to core facilities of devices. However, that permissions could be exploited by attackers when granting certain combinations of permissio...

Citations

... Given how often new malware families appear and how the malware environment is always changing, this method is very helpful for detecting Android spyware. Study (30) and (31) also appear in Table 5. Table 5 lists the various Ensemble analysis methodologies together with the number and percent of studies that fall under each category. ...
... It implies that the Static Ensemble method has received significant attention. (8)(9)(10)(11)(12)(13) , (32)(33)(34)(35)(36) 11 38% Dynamic Analysis Ensembles (14)(15)(16)(17)(18)(19)(20) , (37) 8 27% Hybrid Feature Ensembles (21)(22)(23)(24)(25)(26) 6 20% Structural Analysis Ensembles (27)(28)(29)(30)(31) ...
... Zhao et al. [11] aimed to improve the accuracy of Android malware detection by employing boosting and bagging. Rana and Sung [12] achieved improved accuracy in Android malware detection by combining multiple machine learning classifiers within ensemble learning. Yerima and Sezer [13] proposed a novel multi-level structured classifier fusion approach, training lower-level Android base classifiers to generate models and using a ranking algorithm to select the final classifier, then assigning weights to the prediction results of the selected classifier based on the prediction accuracy of higher-level base classifiers. ...
Article
Full-text available
Machine learning-based malware (malicious software) detection methods have a wide range of real-world applications. However, these types of approaches suffer from the fatal problem of “model aging”, in which the validity of the model decreases rapidly as the malware continues to evolve and variants emerge continuously. The model aging problem is usually solved by model retraining, which relies on lots of labeled samples obtained at great expense. To address this challenge, this paper proposes a semi-supervised continuous learning malware detection model based on Transformer. Firstly, this model improves the lifelong semi-supervised mixture algorithm to dynamically adjust the weighted combination of new sample sequences and historical ones to solve the imbalance problem. Secondly, the Learning with Local and Global Consistency algorithm is used to iteratively compute similarity scores for the unlabeled samples in the mixed samples to obtain pseudo-labels. Lastly, the Multilayer Perceptron is applied for malware classification. To validate the effectiveness of the model, this paper conducts experiments on the CICMalDroid2020 dataset. The experimental results show that the proposed model performs better than existing deep learning detection models. The F1 score has an average improvement of 1.27% compared to other models when conducting binary classification. And, after inputting hybrid samples, including historical data and new data, four times, the F1 score is still 1.96% higher than other models.
... A support vector machine (SVM) was then used as a fusion classifier to learn implicit supplementary information from the ensemble members' outputs and make final predictions. Using an ensemble-based learning methodology, Rana and Sung (2020) proposed and examined a number of machine-learning techniques for identifying Android malware, including a substring-based classifier feature selection (SBFS) strategy. They applied an ensemble learning technique on the DREBIN dataset for enhanced results. ...
Article
Full-text available
Since the advent of malware, it has reached a toll in this world that exchanges billions of data daily. Millions of people are victims of it, and the numbers are not decreasing as the year goes by. Malware is of various types in which obfuscation is a special kind. Obfuscated malware detection is necessary as it is not usually detectable and is prevalent in the real world. Although numerous works have already been done in this field so far, most of these works still need to catch up at some points, considering the scope of exploration through recent extensions. In addition to that, the application of a hybrid classification model is yet to be popularized in this field. Thus, in this paper, a novel hybrid classification model named, MalHyStack, has been proposed for detecting such obfuscated malware within the network. This proposed working model is built incorporating a stacked ensemble learning scheme, where conventional machine learning algorithms namely, Extremely Randomized Trees Classifier (ExtraTrees), Extreme Gradient Boosting (XgBoost) Classifier, and Random Forest are used in the first layer which is then followed by a deep learning layer in the second stage. Before utilizing the classification model for malware detection, an optimum subset of features has been selected using Pearson correlation analysis which improved the accuracy of the model by more than 2 % for multiclass classification. It also reduces time complexity by approximately two and three times for binary and multiclass classification, respectively. For evaluating the performance of the proposed model, a recently published balanced dataset named CIC-MalMem-2022 has been used. Utilizing this dataset, the overall experimental results of the proposed model represent a superior performance when compared to the existing classification models.
... In [45], a multi-faceted deep Generative Adversarial Network model was suggested for an Android malware detection. On the other hand, various ML algorithms, i.e., RF, DT, LR, and SVM, were evaluated in [46] by using an ensemble learning for Android malware identification. Overall, the abovementioned methods demonstrate that ensemble learning is useful to improve the performance of individual malware detection models. ...
... Specifically, seven state-of-the-art single classifier methods published between 2020-2022 have been used, i.e., GCN-JK [33], GCN-AMD [32], GIN-JK [33] ,VGAEMalGAN [34], SAGE-JK [33], DeepDiveDrebin [38], and FMulAMD [39]. Furthermore, nine state-of-the-art ensemble methods have been used, i.e., FSEC-MD [40], PIndroid [13], TA-AMD [41], EAMP-EML [43], ERE-AMD [42], MDGAN-MD [45], Stacking DT-SVM-LR [46], Blending DT-SVM-LR [46]. ...
... Specifically, seven state-of-the-art single classifier methods published between 2020-2022 have been used, i.e., GCN-JK [33], GCN-AMD [32], GIN-JK [33] ,VGAEMalGAN [34], SAGE-JK [33], DeepDiveDrebin [38], and FMulAMD [39]. Furthermore, nine state-of-the-art ensemble methods have been used, i.e., FSEC-MD [40], PIndroid [13], TA-AMD [41], EAMP-EML [43], ERE-AMD [42], MDGAN-MD [45], Stacking DT-SVM-LR [46], Blending DT-SVM-LR [46]. ...
Article
Full-text available
Digital networks and systems are susceptible to malicious software (malware) attacks. Deep learning (DL) models have recently emerged as effective methods to classify and detect malware. However, DL models often relies on gradient descent optimization in learning, i.e., the Back-Propagation (BP) algorithm; therefore, their training and optimization procedures suffer from several limitations, such as high computational cost and local suboptimal solutions. On the other hand, ensemble methods overcome the shortcomings of individual models by consolidating their strengths to increase performance. In this paper, we propose an ensemble-based parallel DL classifier for malware detection. In particular, a stacked ensemble learning method is developed, which leverages five DL base models and a neural network as a meta model. The DL models are trained and optimized with a hybrid optimization method based on BP and Particle Swarm Optimization (PSO) algorithms. To improve scalability and efficiency of the ensemble method, a parallel computing framework is exploited. The proposed ensemble method is evaluated using five malware datasets (namely, Drebin, NTAM, TOP-PE, DikeDataset, and ML_Android), and high accuracy rates of 99.2%, 99.3%, 98.7%, 100%, and 100% have been achieved, respectively. Its parallel implementation also significantly enhances the computational speed by a factor up to 6.75 times. These results ascertain that the proposed ensemble method is effective, efficient, and scalable, outperforming many other compared methods in malware detection. INDEX TERMS Ensemble method, malware detection, deep learning, parallel processing, backpropagation algorithm, particle swarm optimization.
... In [43], an Android malware detection method with multifaceted deep generative adversarial network model was proposed. Various ML models were assessed in [44] by applying an ensemble learning method for detecting Android malware. Overall, the abovemen-tioned methods indicate that the ensemble approach is effective for undertaking malware detection tasks. ...
... Specifically, seven state-of-the-art single classifiers published in 2020-2022 are employed, i.e., GCN-AMD [32], GCN-JK [33], GIN-JK [33], SAGE-JK [33], VGAEMalGAN [34], DeepDiveDrebin [37], and FMulAMD [38]. Nine state-of-the-art ensemble methods are also used, i.e., PIndroid [13], FSEC-MD [39], TA-AMD [40], ERE-AMD [41], EAMP-EML [42], MDGAN-MD [43], Stacking DT-SVM-LR [44], Blending DT-SVM-LR [44]. ...
... Specifically, seven state-of-the-art single classifiers published in 2020-2022 are employed, i.e., GCN-AMD [32], GCN-JK [33], GIN-JK [33], SAGE-JK [33], VGAEMalGAN [34], DeepDiveDrebin [37], and FMulAMD [38]. Nine state-of-the-art ensemble methods are also used, i.e., PIndroid [13], FSEC-MD [39], TA-AMD [40], ERE-AMD [41], EAMP-EML [42], MDGAN-MD [43], Stacking DT-SVM-LR [44], Blending DT-SVM-LR [44]. ...
Article
Full-text available
Malicious software, or malware, has posed serious and evolving security threats to Internet users. Many anti-malware software packages and tools have been developed to protect legitimate users from these threats. However, legacy anti-malware methods are confronted with millions of potential malicious programs. To combat these threats, intelligent anti-malware systems utilizing machine learning (ML) models are useful. However, most ML models have limitations in performance since the training depth is usually limited. The emergence of Deep Learning (DL) models allow more training possibilities and improvement in performance. DL models often use gradient descent optimization, i.e., the Back-Propagation (BP) algorithm; therefore, their training and optimization procedures suffer from local sub-optimal solutions. In addition, DL-based malware detection methods often entail single classifiers. Ensemble learning overcomes the shortcomings of individual techniques by consolidating their strengths to improve the performance. In this paper, we propose an ensemble DL classifier stacked with the Fuzzy ARTMAP (FAM) model for malware detection. The stacked ensemble method uses several heterogeneous deep neural networks as the base learners. During the training and optimization process, these base learners adopt a hybrid BP and Particle Swarm Optimization algorithm to combine both local and global optimization capabilities for identifying optimal features and improving the classification performance. FAM is selected as a meta-learner to effectively train and combine the outputs of the base learners and achieve robust and accurate classification. A series of empirical studies with different benchmark data sets is conducted. The results ascertain that the proposed ensemble method is effective and efficient, outperforming many other compared methods.
... Table 7 illustrates a comparison between the proposed approach and different ensemble approaches such as Decision Tree (DT), Random Forest (RF), Extremely Randomized Tree (ERT), Support Vector Machine (SVM), Logistic Regression (LR), and Gradient Boosting (GB). It can be noticed that in [44], a bagging (RF) has been used as an ensemble approach and is the same ensemble approach used in this paper. Our proposed approach achieved better performance in terms of precision, recall and accuracy compared with other approaches, since the whole approach depends on both the feature selection used and the classifier. ...
Article
Full-text available
Recently, the proliferation of smartphones, tablets, and smartwatches has raised security concerns from researchers. Android-based mobile devices are considered a dominant operating system. The open-source nature of this platform makes it a good target for malware attacks that result in both data exfiltration and property loss. To handle the security issues of mobile malware attacks, researchers proposed novel algorithms and detection approaches. However, there is no standard dataset used by researchers to make a fair evaluation. Most of the research datasets were collected from the Play Store or collected randomly from public datasets such as the DREBIN dataset. In this paper, a wrapper-based approach for Android malware detection has been proposed. The proposed wrapper consists of a newly modified binary Owl optimizer and a random forest classifier. The proposed approach was evaluated using standard data splits given by the DREBIN dataset in terms of accuracy, precision, recall, false-positive rate, and F1-score. The proposed approach reaches 98.84% and 86.34% for accuracy and F-score, respectively. Furthermore, it outperforms several related approaches from the literature in terms of accuracy, precision, and recall.
... Rana et al. [19] suggested and analyzed multiple machine learning techniques for classifying Android malware, coupled with a substring-based classifier feature selection (SBFS) strategy using an ensemble-based learning methodology. They used the Drebin dataset and an ensemble learning approach to achieve better results. ...
Article
Full-text available
The continuous increase in Android malware applications (apps) represents a significant danger to the privacy and security of users’ information. Therefore, effective and efficient Android malware app-classification techniques are needed. This paper presents a method for Android malware classification using optimized ensemble learning based on genetic algorithms. The suggested method is divided into two steps. First, a base learner is used to handle various machine learning algorithms, including support vector machine (SVM), logistic regression (LR), gradient boosting (GB), decision tree (DT), and AdaBoost (ADA) classifiers. Second, a meta learner RF-GA, utilizing genetic algorithm (GA) to optimize the parameters of a random forest (RF) algorithm, is employed to classify the prediction probabilities from the base learner. The genetic algorithm is used to optimize the parameter settings in the RF algorithm in order to obtain the highest Android malware classification accuracy. The effectiveness of the proposed method was examined on a dataset consisting of 5560 Android malware apps and 9476 goodware apps. The experimental results demonstrate that the suggested ensemble-learning strategy for classifying Android malware apps, which is based on an optimized random forest using genetic algorithms, outperformed the other methods and achieved the highest accuracy (94.15%), precision (94.15%), and area under the curve (AUC) (98.10%).
... During the experimental section of this research, two Android malware datasets were employed. These datasets are widely available and often utilized in current research [6,32,[45][46][47]. The first dataset (Drebin) consists of 15,036 instances (5,560 malware and 9476 benign). ...
Article
Full-text available
Due to the exponential rise of mobile technology, a slew of new mobile security concerns has surfaced recently. To address the hazards connected with malware, many approaches have been developed. Signature-based detection is the most widely used approach for detecting Android malware. This approach has the disadvantage of being unable to identify unknown malware. As a result of this issue, machine learning (ML) for detecting malware apps was created. Conventional ML methods are concerned with increasing classification accuracy. However, the standard classification method performs poorly in recognizing malware applications due to the unbalanced real-world datasets. In this study, an empirical analysis of the detection performance of ML methods in the presence of class imbalance is conducted. Specifically, eleven (11) ML methods with diverse computational complexities were investigated. Also, the synthetic minority oversampling technique (SMOTE) and random undersampling (RUS) are deployed to address the class imbalance in the Android malware datasets. The experimented ML methods are tested using the Malgenome and Drebin Android malware datasets that contain features gathered from both static and dynamic malware approaches. According to the experimental findings, the performance of each experimented ML method varies across the datasets. Moreover, the presence of class imbalance deteriorated the performance of the ML methods as their performances were amplified with the deployment of data sampling methods (SMOTE and RUS) used to alleviate the class imbalance problem. Besides, ML models with SMOTE technique are superior to ML models based on the RUS method. It is therefore recommended to address the inherent class imbalance problem in Android Malware detection.
... Two Android malware datasets (Drebin and Malgenome) were used in the experimentation phase of this study. These datasets are publicly accessible and are often used in contemporary Android malware studies [9,[63][64][65][66]. ...
... Clearly, the proposed models (FPA, Cas_FPA, and RoF_FPA) showed significant improvement over the existing Android malware detection solutions. [66]; and Rathore, Sahay, Chaturvedi, and Sewak [34]. These existing Android malware detection models were trained and tested using the Drebin dataset used in the present research. ...
... Salah, Shalabi, and Khedr [75] utilized a feature-selection-based framework for Android malware detection with a detection accuracy value of 94%. Similarly, the solutions produced by Rana and Sung [66], as well as Rathore, Sahay, Chaturvedi, and Sewak [34], had detection accuracy values of 97.24% and 97.92%, respectively. Although these existing methods achieved relatively good detection performances, they were still outperformed by the proposed FPA and its enhanced variants (Cas_FPA and RoF_FPA). ...
Article
Full-text available
As a result of the rapid advancement of mobile and internet technology, a plethora of new mobile security risks has recently emerged. Many techniques have been developed to address the risks associated with Android malware. The most extensively used method for identifying Android malware is signature-based detection. The drawback of this method, however, is that it is unable to detect unknown malware. As a consequence of this problem, machine learning (ML) methods for detecting and classifying malware applications were developed. The goal of conventional ML approaches is to improve classification accuracy. However, owing to imbalanced real-world datasets, the traditional classification algorithms perform poorly in detecting malicious apps. As a result, in this study, we developed a meta-learning approach based on the forest penalizing attribute (FPA) classification algorithm for detecting malware applications. In other words, with this research, we investigated how to improve Android malware detection by applying empirical analysis of FPA and its enhanced variants (Cas_FPA and RoF_FPA). The proposed FPA and its enhanced variants were tested using the Malgenome and Drebin Android malware datasets, which contain features gathered from both static and dynamic Android malware analysis. Furthermore, the findings obtained using the proposed technique were compared with baseline classifiers and existing malware detection methods to validate their effectiveness in detecting malware application families. Based on the findings, FPA outperforms the baseline classifiers and existing ML-based Android malware detection models in dealing with the unbalanced family categorization of Android malware apps, with an accuracy of 98.94% and an area under curve (AUC) value of 0.999. Hence, further development and deployment of FPA-based meta-learners for Android malware detection and other cyber-security threats is recommended.
... Rana et al. [20] proposed and evaluated various machine learning algorithms by applying an ensemble-based learning approach to identify Android malware associated with a substring-based classifier feature selection (SBFS) strategy. They used the DREBIN dataset and achieved better results with an ensemble learning approach. ...
Article
Full-text available
As Android is a popular a mobile operating system, Android malware is on the rise, which poses a great threat to user privacy and security. Considering the poor detection effects of the single feature selection algorithm and the low detection efficiency of traditional machine learning methods, we propose an Android malware detection framework based on stacking ensemble learning—MFDroid—to identify Android malware. In this paper, we used seven feature selection algorithms to select permissions, API calls, and opcodes, and then merged the results of each feature selection algorithm to obtain a new feature set. Subsequently, we used this to train the base learner, and set the logical regression as a meta-classifier, to learn the implicit information from the output of base learners and obtain the classification results. After the evaluation, the F1-score of MFDroid reached 96.0%. Finally, we analyzed each type of feature to identify the differences between malicious and benign applications. At the end of this paper, we present some general conclusions. In recent years, malicious applications and benign applications have been similar in terms of permission requests. In other words, the model of training, only with permission, can no longer effectively or efficiently distinguish malicious applications from benign applications.