TABLE 1 - uploaded by William V Courtright
Content may be subject to copyright.
First-Order Comparison Between the RAID Levels for an N-disk Array

First-Order Comparison Between the RAID Levels for an N-disk Array

Source publication
Article
Full-text available
Redundant disk arrays provide highly-available, high-performance disk storage to a wide variety of applications. Because these applications often have distinct cost, performance, capacity, and availability requirements, researchers continue to develop new array architectures. RAIDframe was developed to assist researchers in the implementation and e...

Citations

... To speed up the failure recovery process and provide highly-available arrays, parity declustering was first proposed by Muntz and Lui [16] as a data layout technique, and it was further realized by Holland and Gibson [13] based on Balanced Incomplete Block Design (BIBD) in practical systems. Due to the benefits of fast recovery and providing highly-available arrays, parity declustering has been implemented in the software RAID device driver RAIDFrame [6], it is also deployed in the Panasas file system [25] and modern erasure-coded storage systems [2]. ...
... To fairly compare the user I/O latency of PDS and round-robin, for both scaling schemes, we should issue the same amount of data migration I/Os in each time slot. Because the volume of migrated data by PDS is less than that by round-robin, when adding n = 1, 2, 3 disks, we set sync_speed_min with PDS as 16,9,6.66 MB/s (i.e., 2 × (7 + n)/n MB/s) respectively so as to ensure approximately the same volume of migrated data with round-robin in a time slot. ...
Article
Full-text available
Parity declustering is widely deployed in erasure-coded storage systems so as to provide fast recovery and high data availability. However, to perform scaling on such RAIDs, it is necessary to preserve parity declustered data layout so as to preserve the properties after scaling. Unfortunately, existing scaling algorithms fail to achieve this goal so they cannot be applied for scaling RAIDs with parity declustering. To address this challenge, we develop an efficient online scaling scheme called PDS (Parity Declustering Scaling), which employs an auxiliary Balanced Incomplete Block Design to define the data migration so as to preserve parity declustered data layout. Furthermore, PDS can also be applied to scale RAIDs for improving reliability and/or storage efficiency as options by allocating more parity blocks and/or data blocks in stripes. We provide theoretical proofs to formally show that PDS preserves parity declustered data layout, and achieves uniform distributions of data and parity blocks after scaling while requiring only the minimal data migration. We implement PDS in Linux kernel 3.14.72, and evaluate its performance with real-world traces. Results show that PDS can reduce 82.37 percent of scaling time and 18.25 percent of user response time during scaling on average, compared with “moving-everything” Round-Robin approach adapted to achieve parity declustered data layout after scaling.
... On contrary, simulation models for arrays are very rare. Till now, only one functional simulator has been developed -a RAIDframe [9], aimed for redundant array software development. Other ones simulate just a part of the array under study like in [10] where Vickovic, Celar, Mudnic: Disk Array Simulation Model Development simulator is used to study performance of the various striping methods and buffer caching scheme, or in [7] where it is used to validate the analytical model results. ...
Article
Full-text available
This paper presents a detailed development process of dynamic, discrete event simulation model for disk array. It combines a hierarchical decomposition with the "bottom up" approach. This way, at the beginning, the focus is set on the elementary storage component - a single disk drive. Once when functional simulation model for disk drive has been achieved it is used as a basic storage element for disk array model development. Further on, it is explored how to simulate different interfaces inside disk array, toward underlying disks. The difference in throughput produced by developed model and measurements is from 1.5-3.16 % for writing and from 2.5-2.8 % for reading, depending on interface type. However, such results are limited on workload imposed by the requirements of the ALICE transient storage system, or more precisely, sequential storing and reading of large data files.
... (iv) rotational position modeling and detailed disk layout (2.6%). A less detailed simulator for the same disk drive reported a respectable demerit figure of 3.9% in Kotz et al. [1999], The Pantheon simulator developed at HP Labs [Wilkes 1996] was used in the evaluation of AutoRAID RAIDframe is a simulation, as well as a rapid prototyping tool, for RAID disk arrays [Courtright et al. 1996]. It has been superseded by the DiskSim simulation package [Bucy et al. 2008]. ...
Article
Full-text available
Parity is a popular form of data protection in redundant arrays of inexpensive/independent disks (RAID). RAID5 dedicates one out of N disks to parity to mask single disk failures, that is, the contents of a block on a failed disk can be reconstructed by exclusive-ORing the corresponding blocks on surviving disks. RAID5 can mask a single disk failure, and it is vulnerable to data loss if a second disk failure occurs. The RAID5 rebuild process systematically reconstructs the contents of a failed disk on a spare disk, returning the system to its original state, but the rebuild process may be unsuccessful due to unreadable sectors. This has led to two disk failure tolerant arrays (2DFTs), such as RAID6 based on Reed-Solomon (RS) codes. EVENODD, RDP (Row-Diagonal-Parity), the X-code, and RM2 (Row-Matrix) are 2DFTs with parity coding. RM2 incurs a higher level of redundancy than two disks, while the X-code is limited to a prime number of disks. RDP is optimal with respect to the number of XOR operations at the encoding, but not for short write operations. For small symbol sizes EVENODD and RDP have the same disk access pattern as RAID6, while RM2 and the X-code incur a high recovery cost with two failed disks. We describe variations to RAID5 and RAID6 organizations, including clustered RAID, different methods to update parities, rebuild processing, disk scrubbing to eliminate sector errors, and the intra-disk redundancy (IDR) method to deal with sector errors. We summarize the results of recent studies of failures in hard disk drives. We describe Markov chain reliability models to estimate RAID mean time to data loss (MTTDL) taking into account sector errors and the effect of disk scrubbing. Numerical results show that RAID5 plus IDR attains the same MTTDL level as RAID6, while incurring a lower performance penalty. We conclude with a survey of analytic and simulation studies of RAID performance and tools and benchmarks for RAID performance evaluation.
... These device drivers are ported from NetBSD with little modification and accurately account for interrupt handling, synchronization, and request queueing. In addition to these physical devices, Lamix includes the RAIDframe software RAID driver [6] and a pseudo disk device for striping and device concatenation. ...
Article
Execution-driven simulation has become the primary method for evaluating architectural techniques as it facilitates rapid design space exploration without the cost of building prototype hardware. To date, most simulation systems have either focused on the cycle-accurate modeling of user-level code while ignoring operating system and I/O effects, or have modeled complete systems while abstracting away many cycle-accurate timing details. The ML-RSIM simulation system presented here combines detailed hardware models with the ability to simulate user-level as well as operating system activity, making it particularly suitable for exploring the interaction of applications with the operating system and I/O activity. This paper provides an overview of the design of the simulation infrastructure and discusses its strengths and weaknesses in terms of accuracy, flexibility, and performance. A validation study using LMBench microbenchmarks shows a good correlation for most of the architectural characteristics, while operating system effects show a larger variability. By quantifying the accuracy of the simulation tool in various areas, the validation effort not only helps gauge the validity of simulation results but also allows users to assess the suitability of the tool for a particular purpose.
... We are able to compare performance for a wide variety of failure scenarios from fault-free up to n − 2 failures within a group of n disks forB as well as fewer failures for RAID-5 and EVENODD. The experiments have been run on Raidframe [5] a tool allowing various RAID architectures to be evaluated. The Raidframe synthetic workload generator creates concurrent sequences of requests for disk accesses via multiple client threads. ...
... These small accesses, especially writes, are typical of demanding workloads. The Raidframe simulator and similar workloads have been utilized in several studies of various data layouts [1,2,5,7,16]. ...
Conference Paper
Full-text available
We present B a novel data layout method for tolerating multiple disk failures within disk arrays. In a disk array with 2n disks, B tolerates at most 2(n - 1) simultaneous failures; reconstruction work is spread over the surviving disks using only exclusive-or operations. The data layout is based upon B array-codes; our approach provides an efficient software implementation. B utilizes the minimal amount of redundant storage space. Our detailed performance comparison with RAID-5 and EVENODD shows B read operations to be very competitive especially in the presence of failures; B write operations are more expensive than RAID-5 and EVENODD write operations. In the presence of failures, the performance gradually degrades as the number of failures increases.
... Ganger, Worthington, and Patt developed the general-purpose DiskSim storage simulation environment [64] which led to the development of a disk characterization and model creation utility [141] and a publicly-available database of validated simulation models for various disk drive products [56]. Work directed toward disk array simulation includes the Pantheon simulator [174], the raidSim simulator [37,101], and the RAIDframe framework [41]. ...
Article
Timing-accurate storage emulation offers a unique capability: flexibility of simulation with the reality of experimental measurements. This allows a researcher to experiment with not-yet-existing storage components in the context of real systems executing real applications. A timing-accurate storage emulator appears to the system to be a real storage component with service times matching a model of the component. This allows simulated components to be plugged into real systems, which can be used for application-based experimentation. Additionally, timing-accurate storage emulation offers opportunity to investigate more expressive interfaces between storage and computer systems. This dissertation identifies a pressing need for a new storage evaluation technique, discusses design issues for achieving accurate per-request service times in a timing-accurate storage emulator, and demonstrates it is feasible to construct such an emulator. We built a functional timing-accurate storage emulator and explored its use in experiments involving models of existing storage products, experiments evaluating the potential of nonexistent storage components, and experiments evaluating interactions between modified computer systems and expanded storage device functionality. We configured our emulator with device models representing an available production disk drive, a hypothetical 50,000RPM disk drive, and a hypothetical MEMS-based storage device, and executed application-level workloads against the models. We applied timing-accurate storage emulation in an investigation into storage-based intrusion detection systems. This demonstrates that our emulator accurately reflects the performance of modeled devices, the feasibility of including intrusion detection capabilities into a standalone processing-enhanced disk drive, and that extensions to existing storage communications paths may be used to transmit and receive information regarding configuration and operational status of such a devi
... The structure shown in Figure 2.5 is called the left-symmetric organization and is formed by first placing the parity units along the diagonal and then placing the consecutive user data units on consecutive disks at the lowest available offset on each disk. This method for assigning data units to disks assures that, if there are any accesses in the workload large enough to span many stripe units, the maximum possible number of disks will be used to service them [8]. ...
... It has already been mentioned that there are several constraints imposed upon the sequence of execution of RAID instructions. The order in which such primitive operations are executed is solely a function of the data and control dependencies which exist between them [8]. Similar to a dynamically scheduled processor, which reorders the execution of instructions as long als there are no dependencies between them, in order to exploit instruction level parallelism, a RAID array designer must know the necessary dependencies which exist between primitive RAID operations in order to efficiently implement them. ...
... Forward error recovery requires anticipating all possible errors and manually coding all actions for completing operations once an error occured [8]. This approach usually requires a lot of code and is hard to modify once it has been layed out to handle a set of errors appropriately. ...
... USB drivers are the currently supported hardware drivers. To give an example for the pseudo device category, rump supports the NetBSD softraid implementation, RAIDframe [4]. In fact, RAIDframe was originally created for prototyping RAID systems and ran also in userspace. ...
Article
BSD-based operating systems implement device drivers in kernel mode for historical, performance, and simplicity reasons. In this paper we extend the Runnable Userspace Meta Program (rump) paradigm of running unmodified kernel code directly in a userspace process to kernel device drivers. Support is available for pseudo de-vice drivers (e.g. RAID, disk encryption, and the Berke-ley Packet Filter) and USB hardware drivers (e.g. mass memory, printers, and keyboards). Use cases include driver development, regression testing, safe execution of untrusted drivers, execution on foreign operating sys-tems, and more. The design and NetBSD implementa-tion along with the current status and future directions are discussed.
Preprint
Full-text available
This is a followup to the 1994 tutorial by Berkeley RAID researchers whose 1988 RAID paper foresaw a revolutionary change in storage industry based on advances in magnetic disk technology, i.e., replacement of large capacity expensive disks with arrays of small capacity inexpensive disks. NAND flash SSDs which use less power, incur very low latency, provide high bandwidth, and are more reliable than HDDs are expected to replace HDDs as their prices drop. Replication in the form of mirrored disks and erasure coding via parity and Reed-Solomon codes are two methods to achieve higher reliability through redundancy in disk arrays. RAID(4+k), k=1,2,... arrays utilizing k check strips makes them k-disk-failure-tolerant with maximum distance separable coding with minimum redundancy. Clustered RAID, local recovery codes, partial MDS, and multilevel RAID are proposals to improve RAID reliability and performance. We discuss RAID5 performance and reliability analysis in conjunction with HDDs w/o and with latent sector errors - LSEs, which can be dealt with by intradisk redundancy and disk scrubbing, the latter enhanced with machine learning algorithms. Undetected disk errors causing silent data corruption are propagated by rebuild. We utilize the M/G/1 queueing model for RAID5 performance evaluation, present approximations for fork/join response time in degraded mode analysis, and the vacationing server model for rebuild analysis. Methods and tools for reliability evaluation with Markov chain modeling and simulation are discussed. Queueing and reliability analysis are based on probability theory and stochastic processes so that the two topics can be studied together. Their application is presented here in the context of RAID arrays in a tutorial manner.