First-Order Comparison Between the RAID Levels for an N-disk Array

A Fast Approach to Scale Up Disk Arrays With Parity Declustered Data Layout by Minimizing Data Migration

Article

Full-text available

Jan 2019

Parity declustering is widely deployed in erasure-coded storage systems so as to provide fast recovery and high data availability. However, to perform scaling on such RAIDs, it is necessary to preserve parity declustered data layout so as to preserve the properties after scaling. Unfortunately, existing scaling algorithms fail to achieve this goal so they cannot be applied for scaling RAIDs with parity declustering. To address this challenge, we develop an efficient online scaling scheme called PDS (Parity Declustering Scaling), which employs an auxiliary Balanced Incomplete Block Design to define the data migration so as to preserve parity declustered data layout. Furthermore, PDS can also be applied to scale RAIDs for improving reliability and/or storage efficiency as options by allocating more parity blocks and/or data blocks in stripes. We provide theoretical proofs to formally show that PDS preserves parity declustered data layout, and achieves uniform distributions of data and parity blocks after scaling while requiring only the minimal data migration. We implement PDS in Linux kernel 3.14.72, and evaluate its performance with real-world traces. Results show that PDS can reduce 82.37 percent of scaling time and 18.25 percent of user response time during scaling on average, compared with “moving-everything” Round-Robin approach adapted to achieve parity declustered data layout after scaling.

Disk array simulation model development

Article

Full-text available

Mar 2011
Int J Simulat Model

This paper presents a detailed development process of dynamic, discrete event simulation model for disk array. It combines a hierarchical decomposition with the "bottom up" approach. This way, at the beginning, the focus is set on the elementary storage component - a single disk drive. Once when functional simulation model for disk drive has been achieved it is used as a basic storage element for disk array model development. Further on, it is explored how to simulate different interfaces inside disk array, toward underlying disks. The difference in throughput produced by developed model and measurements is from 1.5-3.16 % for writing and from 2.5-2.8 % for reading, depending on interface type. However, such results are limited on workload imposed by the requirements of the ALICE transient storage system, or more precisely, sequential storing and reading of large data files.

Higher reliability redundant disk arrays

Article

Full-text available

Nov 2009

Parity is a popular form of data protection in redundant arrays of inexpensive/independent disks (RAID). RAID5 dedicates one out of N disks to parity to mask single disk failures, that is, the contents of a block on a failed disk can be reconstructed by exclusive-ORing the corresponding blocks on surviving disks. RAID5 can mask a single disk failure, and it is vulnerable to data loss if a second disk failure occurs. The RAID5 rebuild process systematically reconstructs the contents of a failed disk on a spare disk, returning the system to its original state, but the rebuild process may be unsuccessful due to unreadable sectors. This has led to two disk failure tolerant arrays (2DFTs), such as RAID6 based on Reed-Solomon (RS) codes. EVENODD, RDP (Row-Diagonal-Parity), the X-code, and RM2 (Row-Matrix) are 2DFTs with parity coding. RM2 incurs a higher level of redundancy than two disks, while the X-code is limited to a prime number of disks. RDP is optimal with respect to the number of XOR operations at the encoding, but not for short write operations. For small symbol sizes EVENODD and RDP have the same disk access pattern as RAID6, while RM2 and the X-code incur a high recovery cost with two failed disks. We describe variations to RAID5 and RAID6 organizations, including clustered RAID, different methods to update parities, rebuild processing, disk scrubbing to eliminate sector errors, and the intra-disk redundancy (IDR) method to deal with sector errors. We summarize the results of recent studies of failures in hard disk drives. We describe Markov chain reliability models to estimate RAID mean time to data loss (MTTDL) taking into account sector errors and the effect of disk scrubbing. Numerical results show that RAID5 plus IDR attains the same MTTDL level as RAID6, while incurring a lower performance penalty. We conclude with a survey of analytic and simulation studies of RAID performance and tools and benchmarks for RAID performance evaluation.

The design and utility of the ML-RSIM system simulator

Article

May 2006
J SYST ARCHITECT

Execution-driven simulation has become the primary method for evaluating architectural techniques as it facilitates rapid design space exploration without the cost of building prototype hardware. To date, most simulation systems have either focused on the cycle-accurate modeling of user-level code while ignoring operating system and I/O effects, or have modeled complete systems while abstracting away many cycle-accurate timing details. The ML-RSIM simulation system presented here combines detailed hardware models with the ability to simulate user-level as well as operating system activity, making it particularly suitable for exploring the interaction of applications with the operating system and I/O activity. This paper provides an overview of the design of the simulation infrastructure and discusses its strengths and weaknesses in terms of accuracy, flexibility, and performance. A validation study using LMBench microbenchmarks shows a good correlation for most of the architectural characteristics, while operating system effects show a larger variability. By quantifying the accuracy of the simulation tool in various areas, the validation effort not only helps gauge the validity of simulation results but also allows users to assess the suitability of the tool for a particular purpose.

B: Disk Array Data Layout Tolerating Multiple Failures

Conference Paper

Full-text available

Jan 2006

We present B a novel data layout method for tolerating multiple disk failures within disk arrays. In a disk array with 2n disks, B tolerates at most 2(n - 1) simultaneous failures; reconstruction work is spread over the surviving disks using only exclusive-or operations. The data layout is based upon B array-codes; our approach provides an efficient software implementation. B utilizes the minimal amount of redundant storage space. Our detailed performance comparison with RAID-5 and EVENODD shows B read operations to be very competitive especially in the presence of failures; B write operations are more expensive than RAID-5 and EVENODD write operations. In the presence of failures, the performance gradually degrades as the number of failures increases.

Timing-Accurate Storage Emulation: Evaluating Hypothetical Storage Components in Real Computer Systems

Article

Sep 2004

John L. Griffin

Timing-accurate storage emulation offers a unique capability: flexibility of simulation with the reality of experimental measurements. This allows a researcher to experiment with not-yet-existing storage components in the context of real systems executing real applications. A timing-accurate storage emulator appears to the system to be a real storage component with service times matching a model of the component. This allows simulated components to be plugged into real systems, which can be used for application-based experimentation. Additionally, timing-accurate storage emulation offers opportunity to investigate more expressive interfaces between storage and computer systems. This dissertation identifies a pressing need for a new storage evaluation technique, discusses design issues for achieving accurate per-request service times in a timing-accurate storage emulator, and demonstrates it is feasible to construct such an emulator. We built a functional timing-accurate storage emulator and explored its use in experiments involving models of existing storage products, experiments evaluating the potential of nonexistent storage components, and experiments evaluating interactions between modified computer systems and expanded storage device functionality. We configured our emulator with device models representing an available production disk drive, a hypothetical 50,000RPM disk drive, and a hypothetical MEMS-based storage device, and executed application-level workloads against the models. We applied timing-accurate storage emulation in an investigation into storage-based intrusion detection systems. This demonstrates that our emulator accurately reflects the performance of modeled devices, the feasibility of including intrusion detection capabilities into a standalone processing-enhanced disk drive, and that extensions to existing storage communications paths may be used to transmit and receive information regarding configuration and operational status of such a devi

RAIDframe & Distributed Storage

Article

Full-text available

Rump Device Drivers: Shine On You Kernel Diamond

Article

Antti Kantee

BSD-based operating systems implement device drivers in kernel mode for historical, performance, and simplicity reasons. In this paper we extend the Runnable Userspace Meta Program (rump) paradigm of running unmodified kernel code directly in a userspace process to kernel device drivers. Support is available for pseudo de-vice drivers (e.g. RAID, disk encryption, and the Berke-ley Packet Filter) and USB hardware drivers (e.g. mass memory, printers, and keyboards). Use cases include driver development, regression testing, safe execution of untrusted drivers, execution on foreign operating sys-tems, and more. The design and NetBSD implementa-tion along with the current status and future directions are discussed.

RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial

Preprint

Full-text available

Jun 2023

Alexander Thomasian

This is a followup to the 1994 tutorial by Berkeley RAID researchers whose 1988 RAID paper foresaw a revolutionary change in storage industry based on advances in magnetic disk technology, i.e., replacement of large capacity expensive disks with arrays of small capacity inexpensive disks. NAND flash SSDs which use less power, incur very low latency, provide high bandwidth, and are more reliable than HDDs are expected to replace HDDs as their prices drop. Replication in the form of mirrored disks and erasure coding via parity and Reed-Solomon codes are two methods to achieve higher reliability through redundancy in disk arrays. RAID(4+k), k=1,2,... arrays utilizing k check strips makes them k-disk-failure-tolerant with maximum distance separable coding with minimum redundancy. Clustered RAID, local recovery codes, partial MDS, and multilevel RAID are proposals to improve RAID reliability and performance. We discuss RAID5 performance and reliability analysis in conjunction with HDDs w/o and with latent sector errors - LSEs, which can be dealt with by intradisk redundancy and disk scrubbing, the latter enhanced with machine learning algorithms. Undetected disk errors causing silent data corruption are propagated by rebuild. We utilize the M/G/1 queueing model for RAID5 performance evaluation, present approximations for fork/join response time in degraded mode analysis, and the vacationing server model for rebuild analysis. Methods and tools for reliability evaluation with Markov chain modeling and simulation are discussed. Queueing and reliability analysis are based on probability theory and stochastic processes so that the two topics can be studied together. Their application is presented here in the context of RAID arrays in a tutorial manner.

PDS: An I/O-Efficient Scaling Scheme for Parity Declustered Data Layout

Conference Paper

Aug 2017

First-Order Comparison Between the RAID Levels for an N-disk Array

Citations