A hot sparing refers to a process that temporarily replaces a failed disk drive with a spare drive in a RAID array by taking the identity of the failed disk drive. When a new disk drive is added to the system, data from the hot spare is copied to it. The hot spare returns to its idle state, ready to replace the next failed drive. Alternatively, the hot spare replaces the failed disk drive permanently. This means that it is no longer a hot spare is available now, and a new hot spare must be configured on the storage system.
A hot spare should be large enough to accommodate data from a failed drive. Some systems implement multiple hot spares to improve data availability.
A hot spare can be configured as automatic or user initiated, which specifies how it will be used in the event of disk failure. In an automatic configuration, when the recoverable error rates for a disk exceed a predetermined threshold, the disk subsystem tries to copy data from the failing disk to the hot spare disk automatically. If this task is completed before the damaged disk fails, the subsystem switches to the hot spare and marks the failing disk as unusable. Otherwise, it uses parity or the mirrored disk to recover the data. In the case of a user-initiated configuration, the administrator has control of the rebuild process. For example, the rebuild could occur overnight to prevent any degradation of system performance. However, the system is at risk of data loss if another disk failure occurs.
Some RAID arrays contain a spare drive that is referred to as a hot-spare or an online spare. This hot-spare operates in standby mode usually powered on but not in use during normal operating circumstances, but it is automatically brought into action, by the RAID controller, in the event of a failed drive.
The major use of hot spares is to enable RAID sets to start rebuilding automatically as soon as possible. The process of rebuilding to a hot-spare drive is often referred as sparing out. Some modern storage arrays have physical hot-spare drives but they reserve a small amount of space on each drive in the array and set this space aside to be used in the even of drive failures, this is referred as distributed sparing.
With the hot spare, one of the following methods of data recovery is performed depending on the RAID implementation
- If parity RAID is used, the data is rebuilt onto the hot spare from the parity and the data on the surviving disk drives in the RAID set.
- If mirroring is used, the data from the surviving mirror is used to copy the data onto the hot spare.
Drive copy rebuilds are computationally simple compared to parity rebuilds, they are faster than having to reconstruct from parity. RAID 1 mirror sets always recover with a drive copy but parity based RAID levels such as RAID 5 and RAID 6 usually have to recover data through parity rebuild.
Go To >> Index Page