Assembly code sequence for binary 4-g "00005068"

Source publication

A scalable multi-level feature extraction technique to detect maliciousexecutables

Article

Full-text available

Mar 2008

We present a scalable and multi-level feature extraction technique to detect malicious executables. We propose a novel combination of three different kinds of features at different levels of abstraction. These are binary n-grams, assembly instruction sequences, and Dynamic Link Library (DLL) function calls; extracted from binary executables, disass...

Abb. 1: Search Interface (The Open University)-Rechercheoberfläche...

Abb. 4: The Bohemian Bookshelf-Suchoberfläche (Thudt, 2012, [S. 2])

Abb. 5: INVISQUE (Wong et al., 2011, S. 2)

Informationsvisualisierung und Retrieval

Article

Full-text available

Feb 2016

Ingeborg Jäger-Dengler-Harles

Ausgewählte Visualisierungsanwendungen der jüngeren Vergangenheit, die den Retrievalprozess betreffen, werden vorgestellt. Die Einsatzszenarien reichen von mobilen kleinformatigen Anwendungen bis zu großformatigen Darstellungen auf hochauflösenden Bildschirmen, von integrativen Arbeitsplätzen für den einzelnen Nutzer bis zur Nutzung interaktiver Ob...

DYNA-B: An Enhanced And Dynamic Batch Size Tuning For LSTM Neural Network

Preprint

Full-text available

Apr 2024

Deep neural networks can be trained in various domains using increasingly large batch sizes without compromising the efficiency. However, this massive data parallelism may differ from domain to domain. It is computationally challenging to train large deep neural networks on large datasets. In response, there has been a surge of interest in utilizing large batch size values during the optimization process, as it enables faster training of these networks, thereby facilitating distributed processing. However, this approach also presents a well-known problem called the "generalization gap," which can result in a degraded performance across multiple datasets. Currently, there is limited understanding of how to determine the optimal batch size. To address this issue, we propose an adaptive tuning algorithm that dynamically adjusts the batch size. Our algorithm consists of four stages: gradient warm-up, loss derivation, calculation of weighted loss using historical batch size data, and batch size updating. We demonstrate the superior performance of our algorithm compared with the traditional constant-batch size approach by comparing it with multiple system-call datasets of varying sizes.

A Novel Feature Extraction Technique for ECG Arrhythmia Classification Using ML

Conference Paper

Nov 2023

Feature extraction is the process of transforming raw data into features that are more relevant for machine learning algorithms. The goal of feature extraction is to find a set of features that can be used to accurately predict the target variable. The specific features that are extracted will depend on the specific application. For example, features that are extracted for the purpose of diagnosing arrhythmias will be different from the features that are extracted for the purpose of assessing myocardial infarction. A generalized new algorithm for feature extraction could be helpful for all complex feature extraction data sets. In this paper, we propose a random selection process to generate the required number of new features with the help of existing specific features of the electrocardiogram (ECG) signal. We have named this novel feature extraction method the Random Feature Explorer (RFE). The proposed method was tested and evaluated using Physio Net's MIT-BIH datasets. The results indicate that the suggested method achieved an accuracy of 99.79% in arrhythmia classification. We have made the source code for our proposed method available on GitHub for open access and reproducibility. The code can be accessed at https://bit.ly/3NnrH4A

Static-RWArmor: A Static Analysis Approach for Prevention of Cryptographic Windows Ransomware

Conference Paper

Full-text available

Nov 2023

The everlasting fight between security researchers and ransomware authors, including cyber criminals who leverage ransomware to cripple organizations worldwide, has continued to evolve as novel techniques are used to evade ransomware detection. The victim not only endures paramount financial loss from business downtime for several days and/or paying ransom to regain control of their environment but also becomes at risk of being exposed to the stolen digital assets out on the Internet. To tackle these threats against ransomware, our research project aims to identify (1) structural similarities among 2,436 cryptographic Windows ransomware samples per calendar year between 2017 and 2021 and (2) structural dissimilarities against 3,014 benign applications using machine learning classifiers. We base our analysis on PE metadata for similarity analysis and binary classification tasks. With the Cosine Index, we capture 71% – 87.80% and 66% – 82.30% of similarities based on imports and function names feature spaces, respectively. On the other hand, after designing four experimental settings, Random Forest outperforms other applied classifiers by achieving 91.75%, 91.99%, 90.47%, and 91.05% at best for accuracy, precision, recall, and F1 scores, respectively, for ransomware detection.

Deep learning from physicochemical information of concrete with an artificial language for property prediction and reaction discovery

Article

Mar 2023
RESOUR CONSERV RECY

Existing machine learning-based approaches to investigate and design concrete mainly use the mixture design variables to predict concrete properties and do not consider the physicochemical properties of ingredients such as the particle size distribution and chemical composition of various binders and aggregates. This paper presents an approach to discover the intrinsic relationships between the physicochemical properties of the ingredients and mechanical properties of concrete. Specifically, this research creates an artificial language to represent concrete mixtures and the physicochemical information of their ingredients, develops a feature extraction method based on character-level N-grams, and proposes a method to configure deep learning models automatically. The proposed approach has been implemented to predict the compressive strength of complex concrete mixtures, assess the importance of variables, and discover chemical reactions, showing high accuracy and high generalizability. This research advances the capabilities of understanding the underlying reactions for complex concrete mixtures and designing low-carbon cost-effective concrete.

Data-Driven Malware Classification Assisted by Machine Learning Methods

Thesis

Dec 2022

Cassius Puodzius

Historically, malware (MW) analysis has heavily resorted to human savvy for manual signature creation to detect and classify MW.This procedure is very costly and time consuming, thus unable to cope with modern cyber threat scenario.The solution is to widely automate MW analysis.Toward this goal, MW classification allows optimizing the handling of large MW corpora by identifying resemblances across similar instances.Consequently, MW classification figures as a key activity related to MW analysis, which is paramount in the operation of computer security as a whole.This thesis addresses the problem of MW classification taking an approach in which human intervention is spared as much as possible.Furthermore, we steer clear of subjectivity inherent to human analysis by designing MW classification solely on data directly extracted from MW analysis, thus taking a data-driven approach.Our objective is to improve the automation of malware analysis and to combine it with machine learning methods that are able to autonomously spot and reveal unwitting commonalities within data.We phased our work in three stages.Initially we focused on improving MW analysis and its automation, studying new ways of leveraging symbolic execution in MW analysis and developing a distributed framework to scale up our computational power.Then we concentrated on the representation of MW behavior, with painstaking attention to its accuracy and robustness.Finally, we fixed attention on MW clustering, devising a methodology that has no restriction in the combination of syntactical and behavioral features and remains scalable in practice.As for our main contributions, we revamp the use of symbolic execution for MW analysis with special attention to the optimal use of SMT solver tactics and hyperparameter settings;we conceive a new evaluation paradigm for MW analysis systems;we formulate a compact graph representation of behavior, along with a corresponding function for pairwise similarity computation, which is accurate and robust;and we elaborate a new MW clustering strategy based on ensemble clustering that is flexible with respect to the combination of syntactical and behavioral features.

Applying NLP techniques to malware detection in a practical environment

Article

Full-text available

Apr 2022
INT J INF SECUR

Executable files still remain popular to compromise the endpoint computers. These executable files are often obfuscated to avoid anti-virus programs. To examine all suspicious files from the Internet, dynamic analysis requires too much time. Therefore, a fast filtering method is required. With the recent development of natural language processing (NLP) techniques, printable strings became more effective to detect malware. The combination of the printable strings and NLP techniques can be used as a filtering method. In this paper, we apply NLP techniques to malware detection. This paper reveals that printable strings with NLP techniques are effective for detecting malware in a practical environment. Our dataset consists of more than 500,000 samples obtained from multiple sources. Our experimental results demonstrate that our method is effective to not only subspecies of the existing malware, but also new malware. Our method is effective against packed malware and anti-debugging techniques.

Unified Detection of Obfuscated and Native Android Malware

Article

Full-text available

Jan 2022
CMC-COMPUT MATER CON

The Android operating system has become a leading smartphone platform for mobile and other smart devices, which in turn has led to a diversity of malware applications. The amount of research on Android malware detection has increased significantly in recent years and many detection systems have been proposed. Despite these efforts, however, most systems can be thwarted by sophisticated Android malware adopting obfuscation or native code to avoid discovery by anti-virus tools. In this paper, we propose a new static analysis technique to address the problems of obfuscating and native malware applications. The proposed system provides a unified technique for extracting features from applications and native libraries using a selection algorithm that can extract a small set of unique and effective features for detecting malware applications rapidly and with a high detection rate. Evaluation using large Android malware detection datasets obtained from various sources confirmed that the proposed approach achieves very promising results in terms of improved accuracy, low false positive rate, and high detection rate.

Malware classification for the cloud via semi-supervised transfer learning

Article

Full-text available

Dec 2020

Malware threats and privacy protection are two of the biggest challenges in the cloud computing environment. Many studies have focused on the accuracy of malware detection, but they did not sufficiently take into account the privacy protection of cloud tenants. This paper proposes a novel malware detection model, based on semi-supervised transfer learning (SSTL) for the cloud, that consists of detection, prediction, and transfer components. To protect the privacy of tenants in the public cloud, a byte classifier based on a recurrent neural network (RNN) for its detection component is designed to detect malware. However, because it is limited by the scarcity of training samples, the accuracy of the byte classifier is only 94.72% after supervised learning. An asm classifier is proposed for the prediction component, and it achieves 99.69% accuracy. The transfer component invokes the prediction component to classify an unlabeled dataset, and it combines the predicted labels and byte features of the unlabeled dataset into a new training dataset. Through the advantages of semi-supervised learning, the new dataset is transferred to the byte classifier for training again. The test results on the Kaggle malware datasets show that semi-supervised transfer learning improved the accuracy of the detection component from 94.72% to 96.9%. The improved malware detection method can not only do a better job of resolving the privacy concerns of tenants in the public cloud than other similar methods, but it can also detect malware more accurately.

The Defence of 2D Poisoning Attack

Preprint

Full-text available

Aug 2020

Zhuoran Tan

The poisoning attack as one of the adversarial machine learning attack has become severe threat to many artificial intelligent systems which use machine learning (ML) and deep learning (DL). Those systems can be re-trained us- ing data collected during operations. With such an poisoning attack, the attackers can possibly evade those systems by disrupting the retraining. Fur- thermore, it has influenced the domain of cyber security [1, 2], which threats many next-generation intrusion and anomaly detection systems. Especially, the adversarial machine learning combined with malware has been developed [3], which increases the possibility for malware to evade detection systems. Such attacks targeting DL models are difficult to defend. Even though there are some academic outcomes for that, [3, 1, 2]. there are no enough strategic tools in industry yet because of wide attack surfaces [4]. Moreover, there are no enough research about the malware detection with poisoning attack. We took the malware as our experiment object to learn the influence from poisoning attack and also explore the strategic defence technology under 2D data. In order to prevent such an attack efficiently, we explored the com- bined approach with generative neural network (GAN) and ensemble train- ing. Moreover, we simulated the whole-stage process from data collection, feature extraction and detection to final defence. During our experiments, we found some components, including combination of merged classifiers, dif- ferent hidden layers, number of units and test size all have influences to the prevention performance for poisoning attack. With some specific configura- tions, including automatic ratios for merged classifiers, optimized numbers for hidden layers, number of units and test size, the detection model for our generated ransomware dataset can achieve steady accuracy above 95% under adversarial one dimensional (2D) input. In order to verify the robustness, we tested our model with another open source malware dataset [5]. The detection accuracy on malware dataset is near 100%, which is even better than ransomware one. The tests we conducted here are all grey-box attacks by assuming some pre-known knowledge for adversaries excluding some core configuration and parameters choices

High-Accuracy Malware Classification with a Malware-Optimized Deep Learning Model

Preprint

Full-text available

Apr 2020

Malware threats are a serious problem for computer security, and the ability to detect and classify malware is critical for maintaining the security level of a computer. Recently, a number of researchers are investigating techniques for classifying malware families using malware visualization, which convert the binary structure of malware into grayscale images. Although there have been many reports that applied CNN to malware visualization image classification, it has not been revealed how to pick out a model that fits a given malware dataset and achieves higher classification accuracy. We propose a strategy to select a Deep learning model that fits the malware visualization images. Our strategy uses the fine-tuning method for the pre-trained CNN model and a dataset that solves the imbalance problem. We chose the VGG19 model based on the proposed strategy to classify the Malimg dataset. Experimental results show that the classification accuracy is 99.72 %, which is higher than other previously proposed malware classification methods.

Assembly code sequence for binary 4-g "00005068"

Similar publications

Citations