RAID

  • Steve Jones - SSC Editor (1/26/2011)


    Note that disks still fail. It's why when data is really critical, you can't necessarily wait on a rebuild. A disk failure from the same batch of drives, could be followed shortly by another disk failure.

    Put the primary filegroup and log on triple mirroring in these cases.

    This is another reason to use RAID 10 rather than RAID 0+1 or RAID 5. Recovery is generally much faster and has less impact on normal workload while recovery takes place if RAID 10 is used, because normally only a single disc is being recovered and only one of the other discs is being hit with extra reads to support recovery.

    It's also important to have spares discs in-house (preferably already mounted in the racks and designated as "hot spares" and formatted in the RAID controller configuration - time to buy in spares and do physical disc swaps (especially in a dark and unattended environment) should be eliminated to reduce the chance of disaster striking before recovery is complete. Triple mirroring sometimes seems to be seen as an excuse not to do this - but if the data is important you should do both.

    edit: spelling

    Tom

  • Terrific question! (That I got wrong.)

    Although this is not a hardware question per se, it is closer to it than most questions. Maybe we should have more questions like this. I think it's easy to get lost in all the software and neglect hardware tech.

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • UMG Developer (1/26/2011)


    Steve Jones - SSC Editor (1/26/2011)


    Note that disks still fail. It's why when data is really critical, you can't necessarily wait on a rebuild. A disk failure from the same batch of drives, could be followed shortly by another disk failure.

    Very true, many years ago we had a case where we put 6 new drives in a server and after about 6 months they started failing, and within 2 months all 6 had failed. They were in a RAID5 set, and lucky for us we got the failed drives replaced and re-built before the next one failed.

    Since then, when possible, I try to get drives from multiple sources to try to avoid them all being from one batch.

    You beat me to it. Happened to us, also years ago -- sometime in the early 90s. We replaced a server and its separate drive tower. After a few months, the drives started failing within days of each other. After it happened, our hardware guys found that it wasn't just a bad batch but an apparent design flaw in the particular model of HD. I think we had three fail before we replaced them all with a different model.

    I guess the point is to take advantage of the reduced risk in diversity of storage, whether it's geographic (off-site) or hardware based, and not to depend entirely on RAID of any flavor.

  • john.arnott (1/27/2011)


    I guess the point is to take advantage of the reduced risk in diversity of storage, whether it's geographic (off-site) or hardware based, and not to depend entirely on RAID of any flavor.

    That has always been my thought. What happens if someone makes a mistake and deletes a major set of data, that action still gets propagated to all the mirrored copies. The only way to recover from that (assuming the data was overwritten by new data) is to have a backup of the data before it was deleted.

  • Just guessed it so that i can get the right answer and understand how luckily i got it right.

  • OK, so I saw after posting my comment that there were three more pages of comments and someone made my point rather well about how the same number of minimum drives could crash a Raid 0+1 or 1+0 set. And both can handle the same number of total drive failures. So I'm changing my comment. Not sure if this was in there but another advantage of RAID 1+0 is you can lose one drive in this mirrored pair then another drive in another mirrored pair, then again, etc, and the RAID array is still cooking, mostly. In RAID 0+1, you lose one drive in one of those striped sets and that striped set is out entirely until you replace that bad drive.

    SQL Managed

Viewing 6 posts - 31 through 35 (of 35 total)

You must be logged in to reply to this topic. Login to reply