Mirrored to oblivion

We have made a trip to data center recently to replace a faulty 900Gb hard drive in a RAID-10 logical volume of a 3.5 year old HP server. In the modern age of cloud hosting we still manage a few physical servers, for various reasons but mostly for our clients. Since last year we knew the drive may fail due to growing number of soft errors and ordered two extra drives in advance to be ready for day X: one as a replacement and another as a spare for that volume.

According to Backblaze report 22% of hard drives do fail within first 4 years of operation. When you plan hardware for a new server or a workstation with disk storage redundancy that may incite a false feeling of safety. For a system admin it is too easy to assume then that extra disk reservation would be “enough protection”. A typical RAID-5 volume can sustain its operation when one drive to go down while still keeping the volume afloat. Another physical disk failure would definitely cause trouble there. That setup is risky enough, though less expensive per gigabyte. But what can go wrong with more expensive fully mirrored volume like RAID-1 or its popular sibling RAID-10? By design we secure whole content on hard drive number one by constant copy over to hard drive number two and vice versa, right? Yes and no. In past we have observed more than one case where a hard drive would die quietly, in fact, manifesting that with slower I/O performance. But the cause of slowness would go unnoticed for few more weeks. Only until the drive number two goes for a Styx ride and then it is a complete disaster: volume becomes inaccessible and should be recreated from scratch.

Early alerts and proactive monitoring is a true remedy against that situation. Redundancy itself does not guarantee that the storage volume will go on forever. When you install or run a new server, remember to configure email notifications for when there would be a hardware trouble. Check system logs or event registry regularly. Not even mentioning critical need for recent reserve copies (a.k.a. backups) of server content, as that is in your 101 of dealing with precious storage devices that were not designed to keep your data for eternity. Everything is temporary in this material world, even the brave tin hard drive, and because of that we should always be prepared.

Comments

(2)

Dennis Gorelik
05/12/2014 at 1:53 pm #

> hard drive would die quietly

Is it because you did not setup email alerts in these cases or because something else failed?

- Andrei Spassibojko
  05/12/2014 at 6:01 pm #
  
  It is just an example of what happens without alerts or failure notifications and proactive monitoring of the system failures.

Comments

Leave a Reply Cancel reply

Your USPS Mail via email:

Categories

Tags

Recent Comments

Spam Prevention