Table 2 - uploaded by A. Zelikovsky
Content may be subject to copyright.
Gap from the Lower Bound Given by Theorem 4 and CPU Seconds (Averages over 10 Random Instances) for the Four In-Place Embedding Optimization Algorithms

Gap from the Lower Bound Given by Theorem 4 and CPU Seconds (Averages over 10 Random Instances) for the Four In-Place Embedding Optimization Algorithms

Source publication
Article
Full-text available
Design of DNA arrays for very large-scale immobilized polymer synthesis (VLSIPS) (Fodor et al., 1991) seeks to minimize effects of unintended illumination during mask exposure steps. Hannenhalli et al. (2002) formulate this requirement as the Border Minimization Problem and give an algorithm for placement of probes at array sites under the assumpti...

Context in source publication

Context 1
... improve the runtime, we stop all algorithms as soon as the improvement for an iteration drops below 0.1% of the total number of conflicts. 5 Table 2 gives the results obtained by the four algorithms when applied to the ...

Similar publications

Article
Full-text available
Future MEMS systems will be composed of larger varieties of devices with very different functionality such as electronics, mechanics, optics and bio-chemstry. Integration technology of heterogeneous devices must be developed. This article first deals with the current development trend of new fabrication technologies; those include self-assembling o...

Citations

... Single Gives optimal solution for arrangement of DNA array design 2 [11] Single Statistical analysis of ALPHABET-LEFTMOST approximation algorithm. Proposed heuristics approach to reduce the length of super-sequence 3 [12] Single Heuristics approach for both synchronous and asynchronous DNA array design 4 [13] Multiple Proposed cost function to perform synthesis cost analysis of multiple DNA arrays based on GC-content 5 [14] Multiple Three problems were solved: (i) number of distinct subsequences (ii) number of -restricted -generated sequences (iii) exact length distribution of the longest increasing subsequences 6 [15] Single Placement method is used to increase the performance of DNA micro array, composed of small DNA fragment(forbes). 7 [16] Multiple The border length minimization(BLMP) has been attempted by the parallel algorithms that are local-search paradigm 8 [17] Multiple Genome sequencing analysis based on clustering method and alphabet contents have been discussed 2. For subsets S 1 that do not belong to D = {A, C, D, G} n , and for prover with probabilistic polynomial time, the probability of accepting the input S 1 by V is negligible. ...
Article
Full-text available
We consider the problem of optimizing the steps involved in the synthesis of DNA strings on a large scale. DNA molecules are a well-known reliable source for storing a large volume of digital data; at the same time, it is very much restricted in real-time usage due to their high cost. A large cluster of DNA strings of a fixed length (random quaternary) has to be partitioned into different batches of finite length with respect to some reference strand such that the sum of the lengths of different reference strands corresponding to each batch of strings is minimum. In this work, the problem is analyzed using a zero-knowledge simulator that recursively executes the protocol and thus helps achieve improved bounds over the cost function for each batch Bi. Also, the proposed proof system allows the input of DNA strings with constraint (homopolymers) and without constraint. The simulator proposed for a single batch and multiple batch optimizations is further analyzed in terms of efficiency and running time, thereby improving the bound (upper and lower bound) of the overall cost of each batch for a given DNA strand.
... For an overview of the biochemical DNA synthesis process, we refer the interested reader to the surveys [26,7]. Our work is motivated by several experimental papers that address the challenge of reducing the synthesis cost in both single and multi-batch settings [16,23,41,24,37,42,27,48,38,47]. Variants of the problem have also been studied that incorporate certain quality control measures [20,11,45,34]. ...
Preprint
Large pools of synthetic DNA molecules have been recently used to reliably store significant volumes of digital data. While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of the high cost and low throughput of available DNA synthesis technologies. We study the role of batch optimization in reducing the cost of large scale DNA synthesis, which translates to the following algorithmic task. Given a large pool $\mathcal{S}$ of random quaternary strings of fixed length, partition $\mathcal{S}$ into batches in a way that minimizes the sum of the lengths of the shortest common supersequences across batches. We introduce two ideas for batch optimization that both improve (in different ways) upon a naive baseline: (1) using both $(ACGT)^{*}$ and its reverse $(TGCA)^{*}$ as reference strands, and batching appropriately, and (2) batching via the quantiles of an appropriate ordering of the strands. We also prove asymptotically matching lower bounds on the cost of DNA synthesis, showing that one cannot improve upon these two ideas. Our results uncover a surprising separation between two cases that naturally arise in the context of DNA data storage: the asymptotic cost savings of batch optimization are significantly greater in the case where strings in $\mathcal{S}$ do not contain repeats of the same character (homopolymers), as compared to the case where strings in $\mathcal{S}$ are unconstrained.
... Row-epitaxial and batched greedy algorithms are used in this study. These algorithms are used in bioinformatics for placement and reembedding [10,11]. ...
Article
Data centers are becoming the main backbone of and centralized repository for all cloud-accessible services in on-demand cloud computing environments. In particular, virtual data centers (VDCs) facilitate the virtualization of all data center resources such as computing, memory, storage, and networking equipment as a single unit. It is necessary to use the data center efficiently to improve its profitability. The essential factor that significantly influences efficiency is the average number of VDC requests serviced by the infrastructure provider, and the optimal allocation of requests improves the acceptance rate. In existing VDC request embedding algorithms, data center performance factors such as resource utilization rate and energy consumption are not taken into consideration. This motivated us to design a strategy for improving the resource utilization rate without increasing the energy consumption. We propose novel VDC embedding methods based on row-epitaxial and batched greedy algorithms inspired by bioinformatics. These algorithms embed new requests into the VDC while reembedding previously allocated requests. Reembedding is done to consolidate the available resources in the VDC resource pool. The experimental testbed results show that our algorithms boost the data center objectives of high resource utilization (by improving the request acceptance rate), low energy consumption, and short VDC request scheduling delay, leading to an appreciable return on investment.
... Experiments have been carried out to show the effectiveness of the algorithm. Other algorithms [4, 17, 18] have been proposed to improve the experimental results. Recently, the problem has been proved to be NP-hard [20] and O( √ n)- approximable [21], where n is the number of probes. ...
Conference Paper
Full-text available
We study a combinatorial problem arising from the microarrays synthesis. The objective of the BMP is to place a set of sequences in the array and to find an embedding of these sequences into a common supersequence such that the sum of the “border length” is minimized. A variant of the problem, called P-BMP, is that the placement is given and the concern is simply to find the embedding. Approximation algorithms have been proposed for the problem [21] but it is unknown whether the problem is NP-hard or not. In this paper, we give a comprehensive study of different variations of BMP by presenting NP-hardness proofs and improved approximation algorithms. We show that P-BMP, 1D-BMP, and BMP are all NP-hard. In contrast with the result in [21] that 1D-P-BMP is polynomial time solvable, the interesting implications include (i) the array dimension (1D or 2D) differentiates the complexity of P-BMP; (ii) for 1D array, whether placement is given differentiates the complexity of BMP; (iii) BMP is NP-hard regardless of the dimension of the array. Another contribution of the paper is improving the approximation for BMP from O(n 1/2 log2n) to O(n 1/4 log2n), where n is the total number of sequences.
... Experiments shows that threading is effective in reducing border length. Since then, other algorithms [4, 15, 16] have been proposed to improve the experimental results. ...
... Asynchronous probe embedding was introduced by Kahng et al. [15]. They studied a special case that the deposition sequence D is given and the embeddings of all but one probes are known. ...
... This algorithm is used as the basis for several heuristics [ On the other hand, there are few theoretical results. In [15], lower bounds on the total border length for synchronous and asynchronous BMP problem were given, which are based on Hamming distance, and Longest Common Subsequence (LCS), respectively. The asynchronous dynamic programming mentioned above computes the optimal embedding of a single probe in time O(ℓ|D|), where ℓ is the length of a probe and D is the deposition sequence. ...
Conference Paper
Full-text available
We study the border minimization problem (BMP), which arises in microarray synthesis to place and embed probes in the array. The synthesis is based on a light-directed chemical process in which unintended illumination may contaminate the quality of the experiments. Border length is a measure of the amount of unintended illumination and the objective of BMP is to find a place- ment and embedding of probes such that the border length is minimized. The problem is believed to be NP-hard. In this paper we show that BMP admits an O( p nlog2 n)-approximation, where n is the number of probes to be syn- thesized. In the case where the placement is given in advance, we show that the problem is O(log2 n)-approximable. We also study a related problem called agreement maximization problem (AMP). In contrast to BMP, we show that AMP admits a constant approximation even when placement is not given in advance.
... • A comprehensive experimental study demonstrating significant solution quality improvements for the enhanced methodologies. In particular, we show that 5-7% improvement in border length can be achieved over the highest-quality scalable flow previously reported in the literature [35], [38] by a tighter integration of probe placement and embedding (Section IV-B). Furthermore, we ...
... For the same synchronous context, [34] suggested an epitaxial, or "seeded crystal growth", placement heuristic similar to heuristics explored in the VLSI circuit placement literature by [43], [48]. Very recently, [35], [38] proposed methods with near-linear runtime combining simple ordering-based heuristics for initial placement, such as lexicographic sorting followed by threading, with heuristics for placement improvement, such optimal reassignment of an "independent" set of probes [50] chosen from a sliding window [18], or a rowbased implementation of the epitaxial algorithm that speeds-up the computation by considering only a limited number of candidates when filling each array site. 4 Previous approaches can be 4 The work of [35], [38] also extends probe placement algorithms to handle practical concerns such as pre-placed control probes, presence of polymorphic probes, unintended illumination between non-adjacent array sites, and position-dependent border conflict into the two-dimensional array of sites using the 1-threading method described in [26]. ...
... Very recently, [35], [38] proposed methods with near-linear runtime combining simple ordering-based heuristics for initial placement, such as lexicographic sorting followed by threading, with heuristics for placement improvement, such optimal reassignment of an "independent" set of probes [50] chosen from a sliding window [18], or a rowbased implementation of the epitaxial algorithm that speeds-up the computation by considering only a limited number of candidates when filling each array site. 4 Previous approaches can be 4 The work of [35], [38] also extends probe placement algorithms to handle practical concerns such as pre-placed control probes, presence of polymorphic probes, unintended illumination between non-adjacent array sites, and position-dependent border conflict into the two-dimensional array of sites using the 1-threading method described in [26]. ...
Article
Full-text available
DNA probe arrays, or DNA chips, have emerged as a core genomic technology that enables cost-effective gene expression monitoring, mutation detection, single nucleotide polymorphism analysis and other genomic analyses. DNA chips are manufactured through a highly scalable process, called Very Large-Scale Immobilized Polymer Synthesis (VLSIPS), that combines photolithographic technologies adapted from the semiconductor industry with combinatorial chemistry. As the number and size of DNA array designs continues to grow, there is an imperative need for highly-scalable software tools with predictable solution quality to assist in the design and manufacturing process. In this chapter we review recent algorithmic and methodological advances forming the foundation for a new generation of DNA array design tools. A recurring motif behind these advances is exploiting the analogy between silicon chip design, pointing to the value of technology transfer between the 40-year old VLSI CAD field and the newer DNA array design field.
Article
In recent years, DNA has emerged as a potentially viable storage technology. DNA synthesis, which refers to the task of writing the data into DNA, is perhaps the most costly part of existing storage systems. Consequently, the high cost and low throughput limit the practical use of available DNA synthesis technologies. It has been found that the homopolymer run (i.e., the repetition of the same nucleotide) is a major factor affecting the synthesis and sequencing errors. Recently, [26] raised and studied the coding problem for efficient synthesis for DNA-based storage systems. Among other things, they studied the maximal code size under synthesis constraints. In [29], the authors studied the role of batch optimization in reducing the cost of large-scale DNA synthesis, for a given pool S of random quaternary strings of fixed length. This problem is related to the problem posed in [26] which can be viewed as the opposite side of the coin. Instead of seeking the largest code in which every codeword can be synthesized in a certain amount of time, they asked what is the average synthesis time of a randomly chosen string. Following the lead of [29], in this paper, we take a step forward towards the theoretical understanding of DNA synthesis, and study the homopolymer run of length k ≥ 1. Specifically, we are given a set of DNA strands S , randomly drawn from a Markovian distribution modeling a general homopolymer run length constraint, that we wish to synthesize. For this problem, we derive asymptotically tight high probability lower and upper bounds on the cost of DNA synthesis, for any k ≥ 1. Our bounds imply that, perhaps surprisingly, the periodic sequence ACGT is asymptotically optimal in the sense of achieving the smallest possible cost. Our main technical contribution is the representation of the DNA synthesis process as a certain constrained system, for which string techniques can be applied.
Article
Large pools of synthetic DNA molecules have been recently used to reliably store significant volumes of digital data. While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of the high cost and low throughput of available DNA synthesis technologies. We study the role of batch optimization in reducing the cost of large scale DNA synthesis, which translates to the following algorithmic task. Given a large pool $\mathcal {S}$ of random quaternary strings of fixed length, partition $\mathcal {S}$ into batches in a way that minimizes the sum of the lengths of the shortest common supersequences across batches. We introduce two ideas for batch optimization that both improve (in different ways) upon a naive baseline: (1) using both $(ACGT)^{\ast}$ and its reverse $(TGCA)^{\ast}$ as reference strands, and batching appropriately, and (2) batching via the quantiles of an appropriate ordering of the strands. We also prove asymptotically matching lower bounds on the cost of DNA synthesis, showing that one cannot improve upon these two ideas. Our results uncover a surprising separation between two cases that naturally arise in the context of DNA data storage: the asymptotic cost savings of batch optimization are significantly greater in the case where strings in $\mathcal {S}$ do not contain repeats of the same character (homopolymers), as compared to the case where strings in $\mathcal {S}$ are unconstrained.
Article
Full-text available
Microarrays are research tools used in gene discovery as well as disease and cancer diagnostics. Two prominent but challenging problems related to microarrays are the Border Minimization Problem (BMP) and the Border Minimization Problem with given placement (P-BMP). The common task of these two problems is to create so-called probe sequences (essentially a string) in a microarray. Here, the goal of the former problem is to determine an assignment of each probe sequence to a unique cell of the array and afterwards to construct the sequences at their respective cells while minimizing the border length of the probes. In contrast, for the latter problem the assignment of the probes to the cells is already given. In this paper we investigate the parameterized complexity of the natural exhaustive variants of BMP and P-BMP, termed \(\text {BMP}^e\) and \(\text {P-BMP}^e\) respectively, under several natural parameters. We show that \(\text {BMP}^e\) and \(\text {P-BMP}^e\) are in FPT under the following two combinations of parameters: (1) the size of the alphabet (c), the maximum length of a sequence (string) in the input (\(\ell \)) and the number of rows of the microarray (r); and, (2) the size of the alphabet and the size of the border length (o). Furthermore, \(\text {P-BMP}^e\) is in FPT when parameterized by c and \(\ell \). We complement our tractability results with a number of corresponding hardness results.