Incorrect RAID drive substitution

clients outside the uk

When a disk in a modern multi-drive RAID array fails, if it has a hot-spare installed, it will “pick” that drive up, mark the bad drive as such, and rebuild the array using the hot spare. The old drive will be marked as bad, and should be replaced as soon as possible.

The problems with drive substitution start occurring when there is no hot spare installed, or if the controller does not automatically proceed as above.

It is then up to the RAID administrator to remove the bad drive, insert a new drive, and manually start the rebuild.

Sometimes, however, for whatever reason, the admin will remove a perfectly good drive and replace it with another, new, perfectly good drive and force a rebuild. Clearly, if the controller allows this to proceed, there is going to be massive corruption.

The worst example we have seen involved a simple, 2-drive RAID 1 array. The admin removed the “bad” drive. He then proceeded to run a utility on the drive that reallocated the bad sectors, and simultaneously ran a destructive disk test.

Once this had been completed, he reinstalled the “repaired” drive. He then rebuild the array; but in his haste, he chose the “repaired” drive as the (now) good drive. The rebuild completed and he ended up with two empty drives – low-level formatted, and not recoverable.