Gap from the Lower Bound Given by Theorem 4 and CPU Seconds (Averages over 10 Random Instances) for the Four In-Place Embedding Optimization Algorithms

Source publication

Scalable Heuristics for Design of DNA Probe Arrays

Article

Full-text available

Feb 2004

Design of DNA arrays for very large-scale immobilized polymer synthesis (VLSIPS) (Fodor et al., 1991) seeks to minimize effects of unintended illumination during mask exposure steps. Hannenhalli et al. (2002) formulate this requirement as the Border Minimization Problem and give an algorithm for placement of probes at array sites under the assumpti...

Context 1

... improve the runtime, we stop all algorithms as soon as the improvement for an iteration drops below 0.1% of the total number of conflicts. 5 Table 2 gives the results obtained by the four algorithms when applied to the ...

View in full-text

MEMS for Heterogeneous Integration of Devices and Functionality

Article

Full-text available

Sep 2007

Hiroyuki Fujita

Future MEMS systems will be composed of larger varieties of devices with very different functionality such as electronics, mechanics, optics and bio-chemstry. Integration technology of heterogeneous devices must be developed. This article first deals with the current development trend of new fabrication technologies; those include self-assembling o...

A study on improving bounds for batch verification of DNA synthesis using concurrent ZKP

Article

Full-text available

Feb 2023

We consider the problem of optimizing the steps involved in the synthesis of DNA strings on a large scale. DNA molecules are a well-known reliable source for storing a large volume of digital data; at the same time, it is very much restricted in real-time usage due to their high cost. A large cluster of DNA strings of a fixed length (random quaternary) has to be partitioned into different batches of finite length with respect to some reference strand such that the sum of the lengths of different reference strands corresponding to each batch of strings is minimum. In this work, the problem is analyzed using a zero-knowledge simulator that recursively executes the protocol and thus helps achieve improved bounds over the cost function for each batch Bi. Also, the proposed proof system allows the input of DNA strings with constraint (homopolymers) and without constraint. The simulator proposed for a single batch and multiple batch optimizations is further analyzed in terms of efficiency and running time, thereby improving the bound (upper and lower bound) of the overall cost of each batch for a given DNA strand.

Batch Optimization for DNA Synthesis

Preprint

Nov 2020

Large pools of synthetic DNA molecules have been recently used to reliably store significant volumes of digital data. While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of the high cost and low throughput of available DNA synthesis technologies. We study the role of batch optimization in reducing the cost of large scale DNA synthesis, which translates to the following algorithmic task. Given a large pool $\mathcal{S}$ of random quaternary strings of fixed length, partition $\mathcal{S}$ into batches in a way that minimizes the sum of the lengths of the shortest common supersequences across batches. We introduce two ideas for batch optimization that both improve (in different ways) upon a naive baseline: (1) using both $(ACGT)^{*}$ and its reverse $(TGCA)^{*}$ as reference strands, and batching appropriately, and (2) batching via the quantiles of an appropriate ordering of the strands. We also prove asymptotically matching lower bounds on the cost of DNA synthesis, showing that one cannot improve upon these two ideas. Our results uncover a surprising separation between two cases that naturally arise in the context of DNA data storage: the asymptotic cost savings of batch optimization are significantly greater in the case where strings in $\mathcal{S}$ do not contain repeats of the same character (homopolymers), as compared to the case where strings in $\mathcal{S}$ are unconstrained.

Efficient virtual data center request embedding based on row-epitaxial and batched greedy algorithms

Article

Mar 2019

Data centers are becoming the main backbone of and centralized repository for all cloud-accessible services in on-demand cloud computing environments. In particular, virtual data centers (VDCs) facilitate the virtualization of all data center resources such as computing, memory, storage, and networking equipment as a single unit. It is necessary to use the data center efficiently to improve its profitability. The essential factor that significantly influences efficiency is the average number of VDC requests serviced by the infrastructure provider, and the optimal allocation of requests improves the acceptance rate. In existing VDC request embedding algorithms, data center performance factors such as resource utilization rate and energy consumption are not taken into consideration. This motivated us to design a strategy for improving the resource utilization rate without increasing the energy consumption. We propose novel VDC embedding methods based on row-epitaxial and batched greedy algorithms inspired by bioinformatics. These algorithms embed new requests into the VDC while reembedding previously allocated requests. Reembedding is done to consolidate the available resources in the VDC resource pool. The experimental testbed results show that our algorithms boost the data center objectives of high resource utilization (by improving the request acceptance rate), low energy consumption, and short VDC request scheduling delay, leading to an appreciable return on investment.

Hardness and Approximation of The Asynchronous Border Minimization Problem

Conference Paper

Full-text available

Nov 2010

We study a combinatorial problem arising from the microarrays synthesis. The objective of the BMP is to place a set of sequences in the array and to find an embedding of these sequences into a common supersequence such that the sum of the “border length” is minimized. A variant of the problem, called P-BMP, is that the placement is given and the concern is simply to find the embedding. Approximation algorithms have been proposed for the problem [21] but it is unknown whether the problem is NP-hard or not. In this paper, we give a comprehensive study of different variations of BMP by presenting NP-hardness proofs and improved approximation algorithms. We show that P-BMP, 1D-BMP, and BMP are all NP-hard. In contrast with the result in [21] that 1D-P-BMP is polynomial time solvable, the interesting implications include (i) the array dimension (1D or 2D) differentiates the complexity of P-BMP; (ii) for 1D array, whether placement is given differentiates the complexity of BMP; (iii) BMP is NP-hard regardless of the dimension of the array. Another contribution of the paper is improving the approximation for BMP from O(n 1/2 log2n) to O(n 1/4 log2n), where n is the total number of sequences.

Approximating Border Length for DNA Microarray Synthesis

Conference Paper

Full-text available

Apr 2008

We study the border minimization problem (BMP), which arises in microarray synthesis to place and embed probes in the array. The synthesis is based on a light-directed chemical process in which unintended illumination may contaminate the quality of the experiments. Border length is a measure of the amount of unintended illumination and the objective of BMP is to find a place- ment and embedding of probes such that the border length is minimized. The problem is believed to be NP-hard. In this paper we show that BMP admits an O( p nlog2 n)-approximation, where n is the number of probes to be syn- thesized. In the case where the placement is given in advance, we show that the problem is O(log2 n)-approximable. We also study a related problem called agreement maximization problem (AMP). In contrast to BMP, we show that AMP admits a constant approximation even when placement is not given in advance.

Computer-Aided Optimization of DNA Array Design and Manufacturing

Article

Full-text available

Mar 2006
IEEE T COMPUT AID D

DNA probe arrays, or DNA chips, have emerged as a core genomic technology that enables cost-effective gene expression monitoring, mutation detection, single nucleotide polymorphism analysis and other genomic analyses. DNA chips are manufactured through a highly scalable process, called Very Large-Scale Immobilized Polymer Synthesis (VLSIPS), that combines photolithographic technologies adapted from the semiconductor industry with combinatorial chemistry. As the number and size of DNA array designs continues to grow, there is an imperative need for highly-scalable software tools with predictable solution quality to assist in the design and manufacturing process. In this chapter we review recent algorithmic and methodological advances forming the foundation for a new generation of DNA array design tools. A recurring motif behind these advances is exploiting the analogy between silicon chip design, pointing to the value of technology transfer between the 40-year old VLSI CAD field and the newer DNA array design field.

Optimal Reference for DNA Synthesis

Article

Nov 2023
IEEE T INFORM THEORY

In recent years, DNA has emerged as a potentially viable storage technology. DNA synthesis, which refers to the task of writing the data into DNA, is perhaps the most costly part of existing storage systems. Consequently, the high cost and low throughput limit the practical use of available DNA synthesis technologies. It has been found that the homopolymer run (i.e., the repetition of the same nucleotide) is a major factor affecting the synthesis and sequencing errors. Recently, [26] raised and studied the coding problem for efficient synthesis for DNA-based storage systems. Among other things, they studied the maximal code size under synthesis constraints. In [29], the authors studied the role of batch optimization in reducing the cost of large-scale DNA synthesis, for a given pool S of random quaternary strings of fixed length. This problem is related to the problem posed in [26] which can be viewed as the opposite side of the coin. Instead of seeking the largest code in which every codeword can be synthesized in a certain amount of time, they asked what is the average synthesis time of a randomly chosen string. Following the lead of [29], in this paper, we take a step forward towards the theoretical understanding of DNA synthesis, and study the homopolymer run of length k ≥ 1. Specifically, we are given a set of DNA strands S , randomly drawn from a Markovian distribution modeling a general homopolymer run length constraint, that we wish to synthesize. For this problem, we derive asymptotically tight high probability lower and upper bounds on the cost of DNA synthesis, for any k ≥ 1. Our bounds imply that, perhaps surprisingly, the periodic sequence ACGT is asymptotically optimal in the sense of achieving the smallest possible cost. Our main technical contribution is the representation of the DNA synthesis process as a certain constrained system, for which string techniques can be applied.

Batch Optimization for DNA Synthesis

Article

Nov 2022
IEEE T INFORM THEORY

Large pools of synthetic DNA molecules have been recently used to reliably store significant volumes of digital data. While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of the high cost and low throughput of available DNA synthesis technologies. We study the role of batch optimization in reducing the cost of large scale DNA synthesis, which translates to the following algorithmic task. Given a large pool $\mathcal {S}$ of random quaternary strings of fixed length, partition $\mathcal {S}$ into batches in a way that minimizes the sum of the lengths of the shortest common supersequences across batches. We introduce two ideas for batch optimization that both improve (in different ways) upon a naive baseline: (1) using both $(ACGT)^{\ast}$ and its reverse $(TGCA)^{\ast}$ as reference strands, and batching appropriately, and (2) batching via the quantiles of an appropriate ordering of the strands. We also prove asymptotically matching lower bounds on the cost of DNA synthesis, showing that one cannot improve upon these two ideas. Our results uncover a surprising separation between two cases that naturally arise in the context of DNA data storage: the asymptotic cost savings of batch optimization are significantly greater in the case where strings in $\mathcal {S}$ do not contain repeats of the same character (homopolymers), as compared to the case where strings in $\mathcal {S}$ are unconstrained.

Batch Optimization for DNA Synthesis

Conference Paper

Jul 2021

Parameterized Complexity of Asynchronous Border Minimization

Article

Full-text available

May 2018
ALGORITHMICA

Microarrays are research tools used in gene discovery as well as disease and cancer diagnostics. Two prominent but challenging problems related to microarrays are the Border Minimization Problem (BMP) and the Border Minimization Problem with given placement (P-BMP). The common task of these two problems is to create so-called probe sequences (essentially a string) in a microarray. Here, the goal of the former problem is to determine an assignment of each probe sequence to a unique cell of the array and afterwards to construct the sequences at their respective cells while minimizing the border length of the probes. In contrast, for the latter problem the assignment of the probes to the cells is already given. In this paper we investigate the parameterized complexity of the natural exhaustive variants of BMP and P-BMP, termed $\text {BMP}^e$ and $\text {P-BMP}^e$ respectively, under several natural parameters. We show that $\text {BMP}^e$ and $\text {P-BMP}^e$ are in FPT under the following two combinations of parameters: (1) the size of the alphabet (c), the maximum length of a sequence (string) in the input ($\ell $) and the number of rows of the microarray (r); and, (2) the size of the alphabet and the size of the border length (o). Furthermore, $\text {P-BMP}^e$ is in FPT when parameterized by c and $\ell $. We complement our tractability results with a number of corresponding hardness results.

Gap from the Lower Bound Given by Theorem 4 and CPU Seconds (Averages over 10 Random Instances) for the Four In-Place Embedding Optimization Algorithms

Context in source publication

Similar publications

Citations