Fig 4 - uploaded by Jun Yao
Content may be subject to copyright.
Architecture of storage area network system 

Architecture of storage area network system 

Source publication
Conference Paper
Full-text available
The speed of storing and fetching data on SCSI disks has a great restriction on the efficiency of SAN based on Fiber Channel Network. In this paper, a high-efficient FC-SAN storage method attributed to the statistics conclusions of the file system workload is designed and implemented. By evaluating the load heavy or light, the system selects DISK o...

Context in source publication

Context 1
... out SCSI commands and messages from SCSI command queue and message queue respectively. If SCSI command instructs to read or write large data, the processing thread of SCSI target simulator then passes the command to the SCSI sub-system of the immediate lower layer, which fully processes the command, performs SCSI disk operations and waits for the sub-system to return results. As to other commands, the procession thread of SCSI target simulator transfers the command to I/O transferring layer, where the command is translated to block I/O requests and passed to RAMDISK driver for memory procession. The devices in SCSI target simulator are mapped to SCSI disks (Target: Lun) in initiator one by one. Here one single RAMDISK device is adopted as a whole instead of allocating each SCSI disk device to a certain RAMDISK device. Therefore, RAMDISK is divided into many different sections, and there can be different ways to map RAMDISK section to SCSI disk device, either one of the former to all of the latter or multiple of the former to all of the latter, namely 1:N, M:N, N:N. Here N:N reflection is applied, in which each RAMDISK section is mapped to one small file on the SCSI disk or file property operation. The asynchronous way applied here to process the SCSI command by the thread greatly promotes efficiency. The concurrent modification of RAMDISK property in the device-mapping table of the SCSI target simulator makes reallocation of the RAMDISK device and section convenient. On the other hand, since the speed of operations on RAMDISK is very high, synchronous mode can be used to store and fetch data. Thus the sequential operations of the RAMDISK at the client are ensured, as well as the consistency of the RAMDISK with the file system is guaranteed. In the implementation, the critical sizes for each data block to read and write operations can be flexibly and separately decided corresponding to the statistics attributions of the file system workload in application. Generally the size of RAMDISK is constant and is divided into multiple sections, each corresponding to a small file of SCSI disk or file attribution operation. If the threshold value is set too large, then the RAMDISK area in each SCSI disk will be consumed too quickly. Correspondingly, if it is set too small, most reading and writing requests will then be sent to SCSI disk, affecting the system efficiency as a whole. For an instance, for database and web server application, the size of blocks is generally 8KB, take up to 50% of the whole room, the threshold value can be set to 16KB and RAMDISK can be divided into 8 parts, each with a capacity of 64MB, thus the system gains a quite good efficiency. On the other hand, a larger threshold value can be set when the much more large files are requested, for example, in VOD service system. In future system, not only can RAMDISK be of large capacity, but dynamically auto-adjust threshold method can be applied as well. Since RAM, a volatile media, is one of the main storage devices in such an FC-SAN system based on physical disks and RAMDISK, special process should perform on the RAM-based DISK under circumstances of abnormality, normal logoff and initialization. Therefore, in the real system, a DOM with a capacity of 512M is selected as the perduring storage media for the RAM-based DISK, thus effectively files copy in response to continual I/O requests. At initialization of the system, program dd (a block copy program under the UNIX- like system) is applied to completely contents copy of DOM directly to the RAM- based DISK. When the system is running, three methods can be adopted for the files copy, file compression method, directly complete copy method and LOG file system. In the real system, the second method above is applied, with a speed of one copy per 60*6 seconds, namely a capacity of 512M from RAM to DOM per 60*6 seconds. A copying thread running in background manipulates all of the tasks of data copying from RAM to DOM. In addition, DOM is an IDE DISK based on FLASH, so the copying task of 512M from RAM by using the background copying thread occupies very little CPU resources and therefore has no impact on the system efficiency. In actual application, there is tradeoff between copy window and process ability because data inconsistency may occur when interval between two consecutive tasks of copying is large. The testing system is made up in laboratory environment. 7 Xeon nodes are used as server nodes which are the clients of the SAN visiting the storage resources. Each node has 4CPUs, SMP mode, 1G DDR RAM and 36G Disk. The operating system is RedHat Linux 7.3. FC network with 2Gbps bandwidth is adopted as resource data transfer network. There is one storage node in the system, which has 2 Xeon 2.4G CPUs, 1G memory and 7*73G disk array. Single SCSI bus connects the disks and the storage node. Special embedded operating system is used in the storage node. The testing hardware environment is shown in Fig 4. The 7 physical disks in disk array are mapped to front host nodes respectively through storage network, that is to say, every host has one network disk to operate. This mapping procedure is transparent to the host file system, and the network disk operates the same with local disk logically. IOMETER is used as testing software. There are two testing group, one tests the system without RAMDISK technique; the other tests the system with the RAMDISK technique. The threshold size of files is set to 64KB. We suppose that RAMDISK has been filled with the files to read before the testing starts. All the file requests to the data smaller than 64KB are done by RAMDISK. The results are shown in Fig.5. We can see from the results that the smaller the files are, the worse the system performance is without RAMDISK. The performance is increased much after adopting RAMDISK technique. It is because that, large quantity of small file operations consumes quite much time on seeking. Especially when the files are very small and distributed discretely, there is bigger expense in disk read performance. The size of files is still set to 64KB, and sequential mode of writing is used. The results are shown in Fig.6. The results show that, writing performance is obviously increased after using RAMDISK. The increasing is similar in the area of 2KB to 64KB, but not very obviously. The reason is that, the RAID subsystem in storage node use cache write- back technique. By such technique, an I/O operation is completed when the data have been written into the memory of SCSI-RAID adapter, and the firmware of SCSI- RAID adapter will move the data to physical disk later to get better write performance. Generally, the size of SCSI-RAID memory assigned to one disk is 64KB, so write operations about the data smaller than 64KB can be done in the cache. That is why the write performance is better than read without RAMDISK. On the other hand, 200MB/s bandwidth of FC also limits the improvement of write performance. With the RAMDISK, the performance is improved 10%~40%, and reaches 185MB/s, 92% of FC bandwidth. It is near the theoretic value. An improved FC-SAN based on the rules of file system workload is designed and implemented which uses both RAMDISK devices and DISK devices as the storage devices in network storage system. By evaluating the load heavy or light, the system selects DISK or RAM devices as the main media for I/O operation, file operations of larger data will be done on disk devices, and reading/writing smaller data or file property operations will be done in RAMDISK devices. The evaluation results reveal that, without RAMDISK, the smaller the file operation is, the worse the performance is. To the system above, the performance of read is 124MB/s and the one of write is 164MB/s. after using RAMDISK, the performance of read is improved to 170MB/s and the one of write is improved to 188MB/s. So, we can conclude that: After using RAMDISK, read performance is improved 50% to 100%, and write performance is improved 10% to 40%. Write operations can make use of 92% bandwidth of Fibre Channel. The more read requests in mixed requests and the smaller the blocks are, the more the performance of the system can be improved. This system reduces the time in seeking in disk and operation times of physical disks, decreases the speed gap between CPU and disks and improves the performance of the whole system. Acknowledgement. The works described in this paper are supported by National High-Tech Research and Development Plan of China under Grant ...

Citations

... The emulator is implemented on a in-house SAN system TH-MSNS [9,10] , in which the target simulator module is a key component [11] . The target simulator module shown in Fig. 1 runs on the target side, processing and responding to SCSI requests. ...
Article
A SCSI target emulator is used in a storage area network (SAN) environment to simulate the behavior of a SCSI target for processing and responding to I/O requests issued by initiators. The SCSI target emulator works with general storage devices with multiple transport protocols. The target emulator utilizes a protocol conversion module that translates the SCSI protocols to a variety of storage devices and implements the multi-RAID-level configuration and storage visualization functions. Moreover, the target emulator implements RAM caching, multi-queuing, and request merging to effectively improve the I/O response speed of the general storage devices. The throughput and average response times of the target emulator for block sizes of 4 KB to 128 KB are 150% faster for reads and 67% faster for writes than the existing emulator. With a block size of 16 KB, the I/O latency of the target emulator is only about 20% that of the existing emulator.
... In this paper, a storage area network (SAN) system, the TH-MSNS (TsingHua Mass Storage Network System) [7], [8], [9] is designed and implemented. The system is based on Linux SCSI and FCP and its storage node has cluster or multiprocessor architecture. ...
... Then, the SCSI target handles all SCSI commands. In the implementation, we chose the SCSI-RAID subsystem to provide an SCSI disk pool to consolidate storage, which passes SCSI commands on to its firmware to complete the final step of the I/O request [7], [11]. ...
Article
With the increasing demand for vast storage repositories, network storage has become important for mass data storage and processing, telescopic addressing and availability, and the quality of service and security of data storage. This situation demands the emergence of new technology in the data storage field. In this paper, TH-MSNS, a SAN system, is introduced. This system was designed and implemented based on the fiber channel protocol and its I/O route was tested. This paper introduces some of the key techniques in the network storage system, including an SCSI simulating target, intelligent and uniform storage management architecture, and the processing flow of the read/write commands. The software for the new storage area network system was implemented as a module in the kernel mode to improve its efficiency. The SCSI target adopts a layered design and standardized interface, which is compatible with various types of SCSI devices and can use different network protocols. The storage management software adopts distributed architecture, which enables higher interoperability and compatibility with various kinds of management protocols. TH-MSNS boasts characteristics such as high adaptability, high efficiency, high scalability, and high compatibility and is easy to maintain.
... We have established a self-developed storage network -the TH-MSNS [9]. Based on it, we implemented some effective data optimizing mechanisms for the storage system. ...
Conference Paper
One of the most effective ways to improve the I/O perfor- mance of a storage system is to enhance the hard disk's read/write ability. We used an I/O processing node in the storage network to optimize data organization and I/O performance. By analyzing existing algorithms and different requirements for read and write operations, we designed an im- proved optimizing algorithm to schedule disk I/O requests. It selects the closest request in queue to process first, and uses an EW mechanism to modify write locations. Typically, the algorithm can reduce a disk's average response time by about 15%-17%. This paper also presents an EW stripe and copy algorithm that can improve I/O performance using parallel disk accesses, and enhance reliability by data duplication. With one copy preserved, it can reduce the response time by about 30%.
Conference Paper
The use of advanced GIS technology and Internet network technology, publishing and sharing spatial data of Surveying and mapping in Web, provides spatial data browse, query and analysis functions for users, has become the inevitable trend of the development of GIS. The surveying data processing system based on network computing, users achieve real-time and dynamic network adjustment calculation and data management, convenient production and use of Surveying and mapping teaching, improve the efficiency of surveying data processing, and provides a practical data processing platform for surveying and mapping personnel training and surveying and mapping production, has a very important practical significance. And it will have a profound influence on the field of professional surveying and mapping, geographic information system and the whole information industry.
Conference Paper
The mechanical nature of the magnetic disks limits the possi- bility of significant improvement of the I/O performance of the magnetic disk storage systems currently in use. The use of magnetic disk stor- age system has become an obstacle to the performance development of critical applications. This paper describes an implementation of a remote non-volatile RAM disk (abbreviated as NVDisk) over Fiber Channel net- work. Read and write latencies are drastically reduced and thus the I/O performance of the storage system is improved by order of magnitudes. We implemented an NVDisk target driver to provide full standard SCSI command set support, so a virtual disk can be constructed for use in the storage area network. NVDisk does not engage the foreground server's CPU and main memory resources, so it can undertake extremely heavy workloads. In addition, we implemented a Virtual Disk (VD) module in the Linux kernel, which used a memory pool and backup disks to form a virtual transparent appliance and achieved the encapsulation of the ramdisk. With this, snapshot-based online backup mechanisms can be carried out.The whole system was built in the FC SAN environment, so the NVDisk is fine scalable and can be shared easily between servers.
Conference Paper
Full-text available
Logical Volume Manager (LVM) has been a key subsystem for online disk storage management. Additional layer is created in the kernel to present a logical view of physical storage devices. Many transparent functions can be implemented between the logical and physical layers, such as merging several physical disks into a larger logical device, resizing logical devices without stopping the system. In a logical volume group, files can be striped into several physical disks so as to achieve high I/O performance. But data I/O parallelism by itself does not guarantee the optimal performance of an application since higher data throughput does not necessarily result in better application performance. This paper studied the dynamic load balancing and data redistribution algorithms in the storage virtualization layer when the load becomes imbalanced across the disks due to access pattern fluctuation. An extension of the heuristic load balancing method was proposed to the storage virtualization subsystem of Tsinghua-Mass Storage Network System (TH-MSNS). Logical volume I/O request status is monitored and the physical disks are sorted according to the access number of Logical Extents (LE) per time unit. The I/O operations on a LE of the hottest disk are transparently migrated to other disks. The preliminary performance simulations under a WWW server file access workload give satisfactory results by the promising cooling algorithm in storage virtualization systems.
Conference Paper
Full-text available
Multipath provides multiple paths between application Servers and storage devices. Multipath can overcome single point of failure, and improve a system’s reliability and availability. This paper presents a multi-layer Multipath, and describes the design and implementation of a Multipath system in a storage area network (SAN). For an application server, we implemented Multipath in the volume management layer. For a storage server, we implemented Multipath in the SCSI Middle Level layer. This system can make the most use of the storage server’s characteristics to decrease the time of failure discovery and location, and it is independent of lower SCSI cards and storage devices, so it has good compatibility. This paper also proposes methods for choosing paths, automatically recovering paths and balancing the load. We tested the read performance and the average response time, and the results showed that with the load balanced, the read performance improves 17.9% on average, and the average response time decreases 15.2% on average.
Conference Paper
Full-text available
With the increasing of storage scale and complexity, the storage management for heterogeneous environments is becoming more and more important. This paper introduces an improved software target emulator integrated with storage virtualization management called VTarget, and some key technologies are explained in detail. The VTarget system can manage various heterogeneous storage resources and provides one virtualization interface for storage management. It is implemented in the storage network layer in the SAN and can support heterogeneous operation systems. The VTarget system also provides an access control mechanism to enhance device security and adapts multi metadata copy technology to improve reliability. We have implemented a prototype of VTarget system and the testing results showed that the storage virtualization management influences the I/O bandwidth less than 3.6% and less than 8% in latency, which only has a very slight effect on storage performance for the mass heterogeneous storage system management.
Conference Paper
Remote mirroring ensures that all data written to a primary storage device are also written to a remote secondary storage device to support disaster recoverability. In this study, we designed and implemented a storage-based synchronous remote mirroring for SAN-attached storage nodes. Taking advantage of the high bandwidth and long-distance linking ability of dedicated fiber connections, this approach provides a consistent and up-to-date copy in a remote location to meet the demand for disaster recovery. This system has no host or application overhead, and it is also independent of the actual storage unit. In addition, we present a disk failover solution. The performance results indicate that the bandwidth of the storage node with mirroring under a heavy load was 98.67% of the bandwidth without mirroring, which was only a slight performance loss. This means that our synchronous remote mirroring has little impact on the host’s average response time and the actual bandwidth of the storage node.