RAID 5 and SQL Server

  • Comments posted to this topic are about the item RAID 5 and SQL Server

  • Hi Tony -

    The timing of this article is both funny, and informative. I am literally in the process right now of finishing up the design of a new database for my company. We went with similar hardware to a previously configuration of theirs (Dell PowerEdge R710). I beefed it out to spec on the 6-drive bay version (wasn't bad given that there wasn't suppose to be a budget for this project). I loaded it with 6-146gb drives, RAID 5 with a 12mb cache controller, and 8gb of RAM (wanted to start with 16, but cut it back due to going with only the 6-bay drive config).

    Now - I had not even heard of this kind of venomous talk about RAID 5 and Databases before. My background has been in I.T. for over 20 years, and RAID 5 was always a very popular and steady platform, for the most part. I'm only newly born to SQL Server (since about 2004 as more of an application analyst), and this is my first titled role as a DBA, but I have to tell you - I would not have gone above RAID 5 for many reasons (not the least of which is the trade off of disk space and the cost for it), but to know that there was all this disparity (no pun intended) over the manner in which the controller writes to the disk is nearly laughable if not questionable. I have never heard of such things being that bad in the past, but then again - I wasn't big on Oracle! 😉

    I'll let you know how my first endeavor goes. It's an OLTP system that is being readied for role out next week!

    Best wishes to you, and keep the articles coming!

    Rich Yarger

    St. Louis Park, MN

    SQL Server DBA

  • The write penalty for RAID 5 is painful, there is little argument there. However, unless you're dealing with incredibly high transactions, the write-cache built into most sans these days alleviates a lot of your pressure.

    RAID 5 has become more and more viable as different companies have worked out ways to reduce the parity pain. It's also still popular for reporting systems.

    In general, I'd agree. There's a cost tradeoff for RAID in SANs, and the fact that rarely does a system get a purely dedicated spindleset in a SAN means you're not really working from a hardware optimization standpoint *anyway*.

    However, when you start dealing with huge systems that you do need hardware optimizations, you're looking at a significant cost. Enterprise Disks aren't cheap, no matter if they're *cheaper*. This isn't cheap. A combination of RAID5 for less transient data and RAID 01 (my preference, a striped series of mirrors, it has more stability) for the high volume data (filegroups are your friend...) is workable. I've seen workarounds from multiple RAID5 setups which where then AGAIN RAID5'd (don't ask) to any other number of ways to try to deal with RAID5 issues.

    In the end, it's cost vs. throughput. There is a price for everything. SQL Server is a mid-cost product, and thus, a lot of times we end up with mid-cost solutions for all but the largest endeavours. We've learned to cope.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I'm used to using RAID 10 almost always for DB setups. But I think that RAID 5 will gain significance once again when SSD disks begin to be more widely used for DB.

    There are several reasons to prefer RAID 5 in such setups:

    - SSDs are generally fast enough, even on writes, to compensate for the parity

    - SSDs are expensive per GB (currently - around $2/GB for MLC) and having more usable GB is good

    Currently I'm considering how to advice a customer for their new DB server and I'm debating with myself :w00t: whether to advocate for RAID 5 or even 6.

    It will be most interesting for me to see the opinion of the other members on this.

  • I think of the disk design more by raid channels matching purpose and separating loads across them. An oltp system I built for a large indie record label a few years ago had 7 channels:

    Raid 1 - operating system and apps, main swapfile

    Raid 10 - Sql data

    Raid 10 - Sql Index

    Raid 1 - Logging

    Raid 5 - Backups to disk

    - plus 2 for the application data / reports etc both Raid 10

    Also if specific tables in the database get a hammering they can be partitioned out onto other raid channels.

  • Depending on the load, I would add also separate disk/array for the tempdb.

    RAID5 for backups is totally ok, but I think an important question is whether RAID5 is usable for the main DB storage? Perhaps maybe the same question can be asked for the log?!?

    Another question that pops in my head is whether SSDs will change the game and if yes - how? Perhaps their speed will lead to simpler disk designs?

  • I'm going with RAID 5 for Filegroups / .NDF's, and something smaller for the OS and installed instance / .MDF. The .LDF will be on yet another disk that is controlled of the mainboard - probably (most likely some form of EIDE or SATA if available to my spec).

    This is a great discussion and is giving me cause for thought on going with RAID 10 on the main array if I can manage it this late in the game for my current project.

  • As you mention, the SAN companies have combated the the issue with larger caches, and to my knowledge, my company solely uses RAID 5 in all SAN arrays. We differentiate tier 1 with tier 2 and tier 3 with the size of cache on the arrays and the RPM's of the disk. We consider our tier 1 to be EMC Symmetrix with 15k disks and the larger write cache. We have XP HP Arrays that have mixed disks.

    Ultimately, from a DBA team perspective, for better or worse, our SAN configuration is a black hole because there are so many different arrays and brands in play (EMC Sym/ Clariion, NetApp, etc). We haven't seen very many problems with RAID 5 and that is what our SAN admins continually tell us is how they carve up the arrays.

    They pre-provision the arrays when they are built, so it's difficult for us to even request a RAID 10 LUN from them because they simply "don't do it".

    Best regards,

    Steve

  • I recently spec'd two new Dell PowerEdge R710 servers for a small data warehouse. One for Analysis Services and the other for SQL Server. The choices for storage were constrained by capacity, performance and direct attach storage disk connections. I ended up choosing RAID 5 for both because I had to meet performance and storage requirements but was limited by 8 DAS connections. The AS system has an 8 x 146GB 15k rpm drive array and the SQL Server system has an 8 x 600GB 10k rpm drive array. Choosing a RAID 10 array could have been accomplished if I were willing to use slower disks to regain the capacity lost due to RAID 10 versus RAID 5. I'm convinced that RAID 10 is the right choice but can't escape practical realities.

  • I think RAID 5 must be burried.

    It was once introduced when magnetic disks where still very expensive and capacity limited, while at the same time there was a need for redundancy. Nowadays, disks are cheap and BIG. So big even that when a raid array is repairing there is a sizable chance of an random error occurring and all data will be lost.

    Add to that the complexity of a the system and with it the sensitivity to errors and larger downtimes. And you will conclude that RAID5 is expensive and unreliable compared to other solutions such as RAID 10.

    And really most database servers are fairly small and do not have fully time dedicated admins. This is especially true for such an accesible product as SQL Server. Why Microsoft claims that RAID 5 is popular with its customers is beyond me. They must be polling the large customers maybe, not the hundreds of thousands smaller ones.

    At my company we had two times now a disk in a RAID 5 broke and the array could not restore itself and had to use backups to continue working on another server. An identical issue with RAID 10, never cause any issues or significant downtime.

    Is there really ANY reason to go for RAID 5 today?

    I think not.

    As for the previously mentioned speed argument...think also about the limited use of a SSD to compensate the use of slower but larger magnetic disks. They are quickly becoming afordable and the performance shatters that of spinning media!

  • At a previous company where we used internal server disks, instead of SANs, we used an interesting mix of RAIDs for the different drives. And as usual the answer is "it depends"...For our systems that were massively critical and performance orientated, we used RAID10. For our backup, archive and reporting systems we used RAID5. Now I am onto SANs and hardly get to see/touch/change anything on them.

    It was said earlier though, SQL was (is) a cost effective solution for databases, when compared to Oracle and the hardware seems to fall inline with that. Would be interesting to Oracle and SQL next to each other on the same hardware doing the same tasks :hehe:

  • We asked our centralized IT server support organization about using an SSD but they will not support them for the time being.

  • Penalties in writing on RAID: during the nineties an issue on server hardware, 2011 an issue on gamer PC.

    In former times it was the calculation and the limited processing speed of hardware raid controllers leading in reduced data transfer rates. E.g. Adaptec AAA2400 has 4x UDMA 33 channels and supports natively Raid5.

    In stripeset mode on 4 discs the controller writes about 126 MB/sec, that's the limit for PCI 32 bit 33 Mhz. In Raid 5 mode the CPU on the controller (Intel I960) gets really hot, and it writes with about 30 MB/sec.

    This penalty is more or less gone, at least if customers don't buy their servers at Walmart having an ICH10R built in.

    Current quality hardware is able to calculate redundancy in realtime. Having 4 disks a simple exor algorithm can be used, but ICH10R is much faster running on odd disk numbers. That is cheap scrap, and an admin will die in hell when he uses ICH10R for productional use.

    I can also understand the Oracle guys living with a dozen independant disks in their system, but forgetting that this is fault multiplying and not fault tolerating.

    Both strategies (Raid and indiv. disks) are able for offline recovery if hardware fails, and Raid5 on cheap hardware is nearly intolerable.

    In general I would talk about reliability and not about "I like / I dislike Raid". Enterprise storage has to be reliable, this is a fact. It has no flaws during a rebuild and just works.

    Spending 100.000 $ for an enterprise SAN doesn't leave gaps, is fast enough so we're discussing about small companies and 1000$-5000$ entry class servers, barely worth being called servers.

    Anyway, a good Raid controller should NOT show any kind of penalties nowadays, nor even or odd disk number affinities.

    In principle raid 5 with n+1 layout can die easily if a disk fails and during the rebuild the next disk fails. Raid 5 n+2 could be a better (and more expensive) solution, but concerning smaller systems having only 4 disks there are 2 idle disks in Raid 5 but also in Raid 10 setup. Raid 10 will die later in this config, but the risk of failing disks is highly increased by just buying a bulk of disks from the same producitonal date.

  • Since I work with the largest SQL Server implementations in the world, it is essential that RAID 5 is removed from the picture, even for read mostly databases. I am a huge proponent of the "kill RAID 5 for databases" movement.

    RAID 10 is still the best general recommendation for databases, with RAID 1 being a good recommendation for some data warehouses ( see the Fast Track Implementation Guide). Those DBAs implementing RAID 1 tend to be more advanced and know their application well enough to balance the disk IO over multiple database files and multiple LUNS themselves.

    One thing that most people miss is that many of the newer disk controllers/HBAs will do dual reads off the mirror. So even read mostly databases will get a big benefit from implementing RAID 10. Check your cards to see if they have this capability, even in the SANs. An interesting side note is that the lower cost SANs were among the first to include these type of cards.

    As far as SSDs go, they are the great equalizer in the random vs sequential IO discussion and really have nothing to do with RAID 5 vs RAID 1/10. You still pay the write penalty with RAID 5. Having said this, you can probably get some relief if you are having IO waits in a RAID 5 implementation with SSD just because it is faster. But at some point if you grow big enough you will end up killing RAID 5 anyway.

    And the discussion about disk caching helping RAID 5, what you are really saying is that it is helping up to a point. In the large systems I work with, write caching is practically worthless and is generally turned off because the volume of writes will overrun the cache and the system ends up waiting to get its turn to write to the cache. This holds true for any RAID implementation.

    In reality, SSD + write caching will help in the small to mid size RAID 5 implementations. But if you know you are going to experience significant growth over the next few months/years, go ahead and start moving to a mirrored RAID solution.

    I would like to close with the statement that those using RAID 5 don't get the right to complain their DBMS doesn't scale.

    Kevin Cox - SQL CAT

  • I always prefer to consider facts rather than rely completely on theory. Some years back I conducted a test of SAN RAID 5 disk arrays vs. RAID 10 arrays, for various sizes and combinations or read and write activity. I found that as disk size increased, the performance of the RAID 5 arrays caught up to the RAID 10 arrays, and by the team I reached 512Gb sized disks, the RAID 5 array was actually faster. Unfortunately I no longer have the results, but in any case your mileage may vary depending on what hardware you are using. If you have a SAN and you have the flexibility to try both options, I would suggest that you try a direct comparison by using the SQLIO benchmark tool and also try timing some simple SQL server operations (e.g. backup).

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply