Preferred RAID

  • RAID 5

    Yesterday we had a somewhat, how do you I put this, poor question of the day. I had asked what were the disadvantages of Raid 5 v Raid 1 on your server. Needless to say I didn't have a great answer or even a good one and everyone got their points back. But there were some interesting discussions that took place and it got me thinking.

    What RAID level do you tend to implement?

    Note that I said "tend". What I mean is in your daily work, in the real world, what are your SQL Servers set up to use for whatever reason. And I'm sure many people would love to know your reasons.

    If cost wasn't an issue, and I could afford hardware in whatever form I need, I'd look to RAID 10 for all my servers, with separate arrays for logs, for tempdb, the OS, pagefile, etc. However the reality is that many of us tend to get fewer drives than we'd like and we make compromises.

    I have to say that I've tended to go with Raid 5 in most of my installations, to balance cost and space needs along with performance. I've done RAID 1 and RAID 10 in places, but usually rarely.

    I will say that I've tended to set up multiple logical drives on my RAID partitions in most cases. Just in case the budget ever comes through to "move" those drives to their own array.

  • I say we tend to do Raid1. Years ago I had a raid 5 box go down and had to go to tape. It was the companys only server for that application. It sucked. So even though I always hear raid 5 is better, that experiance just scared the shit out of me. Am I the onlyone who raid5 has crapped out on?

  • preferred .. RAID 10.

  • At Pixolüt, we have a strategy of using RAID1 for deployed the more lightweight web-facing production environments simply for redundancy in the name of cost management. For our core platforms though we have a mid-range RAID5 setup which is using one of the first PCIe RAID controllers which is the Intel SRCU42e.

    The great thing about the intel is that it can support huge data transfer rates inside the server (PCIe x8 is about 4 times the top speed of PCIx133) and also supports 512MB of ddr333 cache on the controller. We also use Fujitsu 10,000 RPM drives as a compromise between speed and reliability. Whilst it may not be fair to all RAID controllers, I have found that using more than one logical drive tends to kill the seek and read optimizations, so we have ours set up and one big NTFS volume.

    http://www.pixolut.com

  • Re the post below on RAID 10... if it's critical, put a SAN in!  Our current IBM DS8000 systems have 384 drives, new ones will have 640!  Everything is written to battery backed memory cache (up to 256GB) and the system then controls when & where its written across the disks.  This makes for extremely fast storage!

    Yes we've had SAN failure - but extremely rare - and important systems have mirrored storage arrays on different sites and/or log shipping to guarantee uptime.

    Good to see the edit function deleted my original post!!! 

    Basically SANs negate the entire RAID issue, multiple mirrors etc as the data is written to memory then across a large array of disks - there is no performance gain in slicing a SAN into separate LUNs.  The only real requirement is when you need to stop a DB grabbing space and affecting other DBs within that LUN>

  • We try and use RAID 1 / 10 wherever possible. Rarely will we submit to RAID 5. Why? Because, it has limited redundancy - if you lose one of the drives in a RAID 5 array your data is at risk as if it were on RAID 0, you lose another drive and you kiss goodbye to your data. Also, whilst you are a disk down you will take a hit on performance whilst the other disks work to reconstruct what the data would be (via the parity bits) on the missing drive. And then, when you have replaced the faulty drive and are reconstucting the drive, you still take a performance hit (I have heard, up to 80% degradation in performance) - all the while at the risk of losing it all if another disk goes down.  

    If you think the odds are really low on losing more than one drive, well, its happened to me before, on more than one occasion and on different servers, and I know of others who have had similar experiences. Having happened in one place I worked where we had to compromise even though we had warned the business of the risk, it soon changed its mind and we had whatever was required to give us RAID 10.  

  • Usually faced with less than ideal server purchases, I have tended to got Raid1 instead of Raid5.  Even if I only have 4 drives, I will setup 2- Raid1 and have system and logs on one, and data on the other. If I am lucky enough to have 6 drives, I will setup 3 Raid1 (system, data, logs on separate drives). 

    Since I am a consultant and recognize that the client will not do usual DBA tasks after I leave, having Raid5 with a badly fragmented database and indexes will eventually just bring that server to its knees.  I get called in at least once a year to client that has this situation.  I just backup the system, reconfigure the drives to Raid1, restore, and they are good to go.



    Mark

  • Sorry justinb486, but I am going to have to disagree with your comment that a SAN basically negates the entire RAID issue. This has come from experience of having worked in numerous environments where SAN Admins and Architects thought just that and threw DBs into LUNs that were shared by other apps and and even had the volumes virtualized (so that the DB viewed the volume as a large virtual array, when in reality it had only a small volume assigned that would be grown as more space was required).

    Each time there were performance problems, SQL Server was blamed. Following extensive monitoring we were able to prove that the SAN was the bottleneck, esp. around the issue of I/O. This has happened in over five major companies that I have either worked for or been called in as a consultant/contractor. All but one are big names in the financials sector. Presently, I work for company that has thousands of online customers connecting and using the systems concurrently and one of our partners, who has set up their SAN along the lines that you mention of 'large array of disks,' has suffered I/O problems, whereas we are not because we have carved out the disks specific to what is required by the DB Servers based on I/Os and throughput - we're not suffering the perfomance problems our partner is.

    There are a number of others who have had first-hand experience of SAN performance issues, and to name a few from the discussions previously in sqlservercentral, include Tony Rogerson and Colin Leversuch-Roberts. You might want to take a look at the following Microsoft Web Blog too, by the then MS Solutions Rapid Response Engineer for Enterprise Customers, David G Brown, as he goes into a lot of the misconceptions regarding SANs and I/Os and also how the Caching works - Microsoft Support WebCast: I/O performance problems and resolutions in Microsoft SQL Server 2000 - April 13, 2005 (sorry I don't have the link, but hopefully it can be found through Google).

  • When using local disk we build 2 RAID sets on our SQL servers.  Set 1 is a 2 disk RAID 1 for the OS, app and logs.  Set 2 is RAID 5 for data, with as many spindles as we can justify.  We always include a hot spare to address the risk and performance issues involved with a drive failure.  We have on rare ocsasions had 2 drives fail one after the other, but the hot spare has in every case had time to rebuild from the first disk failure before the second failure occurred.


    Ross Hamilton
    DBA

  • I tend to use RAID5 with a battery backed controller, and often with a hot spare configured when using the 14/15 drive sets. Have had drives fail a number of times, but never more than 1 at a time - never broke the stripe.

    SAN's do a lot to boost performance, but you still need some decent disk planning to make it work. Not quite a miracle cure!

  • At all of the companies I have been at we tend to use RAID 5.  Many times it has to do strictly with cost, and you get more usable disk for the money with RAID5 while still having some redundancy.  At the risk of annoying some people, the other reason we have done RAID5 is lack of knowledge on the O/S administrators part -they just haven't understood that in high transaction environments RAID 1+0 performs better.  Having said that, If you have the money for a high end RAID array that has lots of redundant controllers and cache you can practically eliminate the write penalty with RAID 5.

    On the failed disk side, I have only seen RAID 5 when it isn't managed well.  With the low cost of disk drives I think you should always have a spare on hand (even if its in a file cabinet).  Unfortunately I've seen to many times where a disk fails and then people are slow to react and get a replacement drive installed.  In one case response time was slow enough that a second drive failed (4 weeks after the first!) and then we had to restore from tape.  I really wouldn't recommend waiting more than a few hours to get a replacement drive installed if you don't have a hot spare. 

    Bottom line with any RAID = You still ned to monitor the health of your drives daily!  (end of rant)

  • RAID 1 for OS and applications on local disks, all data lives on the SAN in dedicated RAID 10 volumes.

    Artificial Intelligence stands no chance against Natural Stupidity.

  • iwq is certainly correct. The tuning of a SAN installation is critical to getting good results.

    I have reviewed several SAN installations, and the majority had performance problems because the people who were administering those installations did not understand how the SAN worked. A good monitoring and load testing program is essential.

     

     

  • I'm having problems posting to this site - it keeps losing or overwriting my posts :'(

    To iwg - my prev post got mangled too!  We don't share LUNs between RDBMS & applications, they're separate LUNs, for failover, control, relocation etc.

    From our dealings with Microsoft architects, SAN specialists, Oracle etc we've found that presenting multiple LUNs to an instance for System DBs, Temp DB and User DBs provides no real value.  They all use the same failover HBAs to travel over the same fabric to the same core switches and same SAN, along with many other servers.  All that is done with Wintel is wasting drive letters (thankfully SQL05 uses mount points!)

    We have dedicated LUNs per Instance for failover control, with Instances holding 1 critical DB or several less critical DBs (by Business Group).

  • See why RAID-5 is BAD BAD BAD at the "Battle Against Any RAID-F" website (http://www.baarf.com/):

    It’s important to note, that RAID-3, -4, and -5 are all included in the initiative, so the F in BAARF both stands for Five, Four, and …err… Free.

    The reason for BAARF is that we’ve had it. Enough is Enough. For 15 years a lot of the world’s best database experts have been arguing back and forth with people (vendors and others) about the pros and cons of RAID-3, -4 and -5.

    Cary Millsap has written excellent articles on RAID technologies that should have stopped the use- and pointless discussions many years ago. Many others have written splendid articles about it as well. Many.

Viewing 15 posts - 1 through 15 (of 41 total)

You must be logged in to reply to this topic. Login to reply