Figure - uploaded by Morteza Hosseini
Content may be subject to copyright.
The authenticated encryption operation by AES method in GCM mode. To encrypt, block numbers generated by counters are encrypted by AES method, using the key K. The results are XORed with the plaintext to produce the ciphertext. To verify the data authenticity, a block of additional authenticated data, “Auth Data 1”, is fed to a Galois Mult block. Its output is used in generating the final authentication tag. Galois Mult performs multiplication in Galois Field, GF (2^128), using the hash key H. Note, “Incr” denotes the function of counter increment.

The authenticated encryption operation by AES method in GCM mode. To encrypt, block numbers generated by counters are encrypted by AES method, using the key K. The results are XORed with the plaintext to produce the ciphertext. To verify the data authenticity, a block of additional authenticated data, “Auth Data 1”, is fed to a Galois Mult block. Its output is used in generating the final authentication tag. Galois Mult performs multiplication in Galois Field, GF (2^128), using the hash key H. Note, “Incr” denotes the function of counter increment.

Source publication
Article
Full-text available
The ever-increasing growth of high-throughput sequencing technologies has led to a great acceleration of medical and biological research and discovery. As these platforms advance, the amount of information for diverse genomes increases at unprecedented rates. Confidentiality, integrity and authenticity of such genomic information should be ensured d...

Similar publications

Article
Full-text available
Escherichia coli is a priority foodborne pathogen of public health concern and phenotypic serotyping provides critical information for surveillance and outbreak detection activities. Public health and food safety laboratories are increasingly adopting whole-genome sequencing (WGS) for characterizing pathogens, but it is imperative to maintain serot...
Preprint
Full-text available
The clinical presentation overlap between malaria and COVID-19 poses special challenges for rapid diagnosis in febrile children. In this study, we collected RNA-seq data of children with malaria and COVID-19 infection from the public databases as raw data in fastq format paired end files. A group of six, five and two biological replicates of malari...
Article
Full-text available
In this paper, we present a toolset and related resources for rapid identification of viruses and microorganisms from short-read or long-read sequencing data. We present fastv as an ultra-fast tool to detect microbial sequences present in sequencing data, identify target microorganisms and visualize coverage of microbial genomes. This tool is based...
Article
Full-text available
Repetitive DNA sequences cause genomic instability and are important genetic markers. Identification of repeats is a critical step in genome annotation and analysis. On the other hand, repeats also pose a technical challenge for genome assembly and alignment programs using NGS data. RFGR is a comprehensive tool that can find exact repetitive sequen...
Preprint
Full-text available
Motivation Gene expression levels exhibit stochastic variations among genetically identical organisms under the same environmental conditions. In many recent transcriptome analyses based on RNA sequencing (RNA-seq), variations in gene expression levels among replicates were assumed to follow a negative binomial distribution although the physiologic...

Citations

... Privacy and security are paramount when dealing with sensitive information, such as genomics, which may have implications for individuals' health, ancestry, and even predispositions to certain diseases. To address these concerns, researchers have proposed various methods for ensuring the secure encryption of genomic data [10]. ...
... Cryfa uses AES (Advanced Encryption Standard) encryption [12], [13] combined with a shuffling mechanism to enhance security against low data complexity attacks. As a result, Cryfa can provide confidentiality, integrity, and authenticity of genomic data faster and with smaller file sizes than the general-purpose encryption tool AES Crypt [10]. Although this tool can solve one part of the problem, the process of distributing the keys between pairs remains a problem. ...
... Meanwhile, the size of data can be reduced through various data compression methods during preservation and transmission, such as Arithmetic Coding (AC), Huffman Coding (HC), and Lempel-Ziv (LZ) [14,15]. Data compression changes the input data into a smaller, compressed format by removing redundancies. ...
... Following compression, the BPC value decreases, indicating a reduction in the number of bits needed to represent each character. Eq. (14) can be employed to calculate this BPC value. ...
Article
Full-text available
Data stored on physical storage devices and transmitted over communication channels often have a lot of redundant information, which can be reduced through compression techniques to conserve space and reduce the time it takes to transmit the data. The need for adequate security measures, such as secret key control in specific techniques, raises concerns about data exposure to potential attacks. Encryption plays a vital role in safeguarding information and maintaining its confidentiality by utilizing a secret key to make the data unreadable and unalterable. The focus of this paper is to tackle the challenge of simultaneously compressing and encrypting data without affecting the efficacy of either process. The authors propose an efficient and secure compression method incorporating a secret key to accomplish this goal. Encoding input data involves scrambling it with a generated key and then transforming it through the Burrows-Wheeler Transform (BWT). Subsequently, the output from the BWT is compressed through both Move-To-Front Transform and Run-Length Encoding. This method blends the cryptographic principles of confusion and diffusion into the compression process, enhancing its performance. The proposed technique is geared towards providing robust encryption and sufficient compression. Experimentation results show that it outperforms other techniques in terms of compression ratio. A security analysis of the technique has determined that it is susceptible to the secret key and plaintext, as measured by the unicity distance. Additionally, the results of the proposed technique showed a significant improvement with a compression ratio close to 90% after passing all the test text files.
... An automatic application that assesses design security characteristics and extracts cryptographic mean parameters that provide up-to-date security would be a powerful tool (2) Attack Assessment. Variety of security attacks tool is a useful pre-decision assessment for developers to consider [175]. Proper cryptographic attacks can be converted to an automatic tool-based strategy to assess and validate a scheme's strength. ...
Article
Full-text available
Internet of Things (IoT) is a promising technology for creating smart environments, smart systems, and smart services. Since security is a fundamental requirement of IoT platforms, solutions that can provide both encryption and authenticity simultaneously have recently attracted much attention from academia and industry. This article analyses in detail state-of-the-art lightweight authenticated encryption (LAE) targeted to IoT systems. This work provides a thorough description of the algorithms, and the study systematically classifies them to facilitate understanding of relevant intricacies of the schemes. Among reviewed algorithms, there is a trade-off to retain design security, resources cost, and efficient performance. ACORN is the effective scheme on various platforms in terms of utilization of resources and power consumption, while MORUS and AES-CLOC are the fastest in hardware platforms. However, they are susceptible to misuse despite their resistance to side channel attacks. In contrast, JOLTICK, PRIMATESs, COLM, DeoxysII, OCB, and AES-JAMBU are provably resistant to nonce misuse. The challenges for possible future research are summarized. Overall, the article provides researchers and developers with practical guidance on various design aspects and limitations as well as open research challenges in the current lightweight authenticated encryption for IoT.
... Although the CRAM format outlines the file layout and decoding mechanism, the encoder has full control over how the data is broken up. The principal challenge in this area is to find a way to characterize, or encrypt, the structure of graphs so that machine learning models can effortlessly exploit [10], [11]. For example, one might want to classify a protein's position in a biological interaction graph, find contribution of a sample under consideration in a collaboration network, recommends new mates to a social network consumer, or predict novel therapeutic formulations of present receptors for the drug, the composition of which can be described as a graph [12], [13]. ...
Article
Full-text available
Owing to the substantial volume of human genome sequence data files (from 30-200 GB exposed) Genomic data compression has received considerable traction and storage costs are one of the major problems faced by genomics laboratories. This involves a modern technology of data compression that reduces not only the storage but also the reliability of the operation. There were few attempts to solve this problem independently of both hardware and software. A systematic analysis of associations between genes provides techniques for the recognition of operative connections among genes and their respective yields, as well as understandings into essential biological events that are most important for knowing health and disease phenotypes. This research proposes a reliable and efficient deep learning system for learning embedded projections to combine gene interactions and gene expression in prediction comparison of deep embeddings to strong baselines. In this paper we preform data processing operations and predict gene function, along with gene ontology reconstruction and predict the gene interaction. The three major steps of genomic data compression are extraction of data, storage of data, and retrieval of the data. Hence, we propose a deep learning based on computational optimization techniques which will be efficient in all the three stages of data compression.
... Homomorphic encryption can be used to encrypt stored genomic data; nevertheless, it is susceptible to brute force attacks [57]. Hosseini et al. [58] presented a tool to compress and encrypt FASTA files called CRYFA with low overhead DNA encryption and a compression capable of recognising various digital DNA file formats. CRYFA operates in two phases; phase one divides the DNA file into blocks and shuffles them, and phase two is to encrypt the file with AES standard encryption. ...
Article
DNA sequencing technologies have advanced significantly in the last few years leading to advancements in biomedical research which has improved personalised medicine and the discovery of new treatments for diseases. Sequencing technology advancement has also reduced the cost of DNA sequencing, which has led to the rise of direct-to-consumer (DTC) sequencing, e.g. 23andme.com, ancestry.co.uk, etc. In the meantime, concerns have emerged over privacy and security in collecting, handling, analysing and sharing DNA and genomic data. DNA data are unique and can be used to identify individuals. Moreover, those data provide information on people’s current disease status and disposition, e.g. mental health or susceptibility for developing cancer. DNA privacy violation does not only affect the owner but also affects their close consanguinity due to its hereditary nature. This article introduces and defines the term ‘digital DNA life cycle’ and presents an overview of privacy and security threats and their mitigation techniques for predigital DNA and throughout the digital DNA life cycle. It covers DNA sequencing hardware, software and DNA sequence pipeline in addition to common privacy attacks and their countermeasures when DNA digital data are stored, queried or shared. Likewise, the article examines DTC genomic sequencing privacy and security.
... The link manager sets up links, negotiates features, and administers connections that are up and running. 4. Logical Link communication and Adaptation Protocol (L2CAP): Its work is to reformate a large chunk of data into smaller chunks [5]. ...
Chapter
Full-text available
In this chapter, the main goal is to get a cryptosystem with lesser complexity, which derives smaller ciphertext. Encryption and decryption provide a secure way of data transfer from one point to another. Bluetooth, a wireless technology, uses radio technology with a short range. Bluetooth is designed with the view of low-energy-consuming devices. Data transferred over Bluetooth too need security; for this, encryption of the data to be sent is done. Later, the data is decrypted using the same key. The whole process requires key generation and multiple rounds, which in turn consumes energy. As the device has low processing power, making a heavy computation is not possible. If the host with which the Bluetooth is attached has a good amount of computational power, then there is no problem, but when no such condition prevails, it is good to make a system such that it consumes a lesser amount of energy and requires lesser computational power. In this chapter, there is neither need heavy calculation nor large volume of memory using genetic codons for encryption. It is a simple replacement but it involves preprocessing to get the required plaintext. The given process does not require any major mathematical calculation and hence is computationally economic. There is merely the mapping of the plaintext with the genetic codon to convert it into ciphertext, but there are certain processes performed that provide the necessary strength to the ciphertext. The genetic codon, due to its versatility, provides us even stronger ciphertext than its mathematical counterpart.
... Large volumes of data have raised concerns for secure storage, privacy, and accessibility of data. 6 Encryption along with compression is the solution to address these issues. Cryptographic schemes can be applied for the security and confidential accessibility of data. ...
... Cryptographic schemes can be applied for the security and confidential accessibility of data. 6 The use of cloud computing and blockchain (for bioinformatics applications) opens new challenges for compression tools. Encryption of compressed datasets is a challenging task that needs to address security along with other parameters. ...
Article
Full-text available
Recent advancements in sequencing methods have led to significant increase in sequencing data. Increase in sequencing data leads to research challenges such as storage, transfer, processing, etc. data compression techniques have been opted to cope with the storage of these data. There have been good achievements in compression ratio and execution time. This fast-paced advancement has raised major concerns about the security of data. Confidentiality, integrity, authenticity of data needs to be ensured. This paper presents a novel lossless reference-free algorithm that focuses on data compression along with encryption to achieve security in addition to other parameters. The proposed algorithm uses preprocessing of data before applying general-purpose compression library. Genetic algorithm is used to encrypt the data. The technique is validated with experimental results on benchmark datasets. Comparative analysis with state-of-the-art techniques is presented. The results show that the proposed method achieves better results in comparison to existing methods.
... The most frequent formats [7,17] for storing genomic information do not provide confidentiality mechanisms, and thus no attacks can be proposed against them. To palliate against this lack of confidentiality, other formats have since been introduced: Cryfa [12] for unaligned data and SECRAM [13] for aligned data. To the best of our knowledge no attacks have been proposed against any of them. ...
Article
Full-text available
New genome sequencing technologies have decreased the cost of generating genomic data, thus increasing storage needs. The International Organization for Standardization (ISO) working group MPEG has developed a standard for genomic data compression with encryption features. The approach taken in standard MPEG-G (ISO/IEC 23092) to compress genomic information was to group similar data into streams. Taking this into account, one of the protection options considered was to encrypt each stream separately. In this paper, we show that an attacker can use an unencrypted stream to deduce the encrypted content if streams are encrypted separately. To do so, we present two different attacks, one based on signal processing and the other one based on neural networks. The signal-based attack only works with unrealistic settings, whereas the neural network-based one recovers data with realistic settings (regarding read length and coverage). The presented results made MPEG reconsider the encryption strategy, before final publication of the standard, discarding separate streams encryption approach.
... There are a variety of solutions for keeping (genomic) data secure, each addressing different aspects of data handling. Cryfa (Morteza et al., 2019) is a software tool aimed at encrypting genomic data while also compressing it by taking advantage of the known structure of common genomic file formats. The goal of Cryfa is to enable secure storage/archiving of genomic data. ...
Article
Full-text available
Motivation The majority of genome analysis tools and pipelines require data to be decrypted for access. This potentially leaves sensitive genetic data exposed, either because the unencrypted data is not removed after analysis, or because the data leaves traces on the permanent storage medium. Results We defined a file container specification enabling direct byte-level compatible random access to encrypted genetic data stored in community standards such as SAM/BAM/CRAM/VCF/BCF. By standardizing this format, we show how it can be added as a native file format to genomic libraries, enabling direct analysis of encrypted data without the need to create a decrypted copy. Availability The Crypt4GH specification can be found at: http://samtools.github.io/hts-specs/crypt4gh.pdf
... Exploring distributed and private biobanks without compromising ethical regulations is a complex subject already studied by the research community. Almost all sequenced genomes are currently stored in protected repositories with strict access rules or securely encrypted [15][16][17][18]. However, accessing those datasets allows analysis of a larger number of subjects in order to identify genetic variants that are statistically correlated [19]. ...
Article
Full-text available
Privacy issues limit the analysis and cross-exploration of most distributed and private biobanks, often raised by the multiple dimensionality and sensitivity of the data associated with access restrictions and policies. These characteristics prevent collaboration between entities, constituting a barrier to emergent personalized and public health challenges, namely the discovery of new druggable targets, identification of disease-causing genetic variants, or the study of rare diseases. In this paper, we propose a semi-automatic methodology for the analysis of distributed and private biobanks. The strategies involved in the proposed methodology efficiently enable the creation and execution of unified genomic studies using distributed repositories, without compromising the information present in the datasets. We apply the methodology to a case study in the current Covid-19, ensuring the combination of the diagnostics from multiple entities while maintaining privacy through a completely identical procedure. Moreover, we show that the methodology follows a simple, intuitive, and practical scheme.