RAID 5 and SQL Server

  • In debating RAID 5 against RAID 10 performance for SQL we need, in my view, to distinguish between read and write performance

    For read performance:

    SQL read page requests (except for SQL read-ahead) are generally un-overlapped; your request waits until the read is satisfied and your request can continue, you need to retrieve the top index level before you retrieve the next index level, before you retrieve the leaf index level, before you retrieve the data row. In effect, poor read performance mushrooms. In partial mitigation of this, metadata and non-leaf index pages have a relatively higher hit rate than index leaf and data pages, they tend to stay buffer or cache resident longer

    Raid 10 may satisfy read requests faster than RAID 5; a competent RAID 10 implementation will allow access to the first available mirror copy. In effect, path contention can be halved by RAID 10, or it can be seen as doubling the effective device bandwidth. In a multi-user SQL environment this can be very significant. In some SAN environments path contention (rather than device contention) may be the major bottleneck, and the benefit of RAID 10 above RAID 5 may not be seen because of this constraint

    A large device cache may, or may not, be an advantage to read performance. In most of the cached devices I

    have worked with recently, including SANs, the SQL buffer pool has been significantly larger than the device cache. In effect, any SQL buffer miss is almost by definition also a device cache miss (both use similar ageing algorithms).

    In my (biased) view additional RAM allocated to SQL buffer pools, rather than a larger device cache, is far more effective until the device cache is multiple times larger than the SQL buffer pool. A SQL buffer hit may be more like 10000 times faster than a physical read, a device cache hit may be less than a 100 times faster than a physical read under load (depending on implementation and path contention)

    For write performance:

    Data page write requests are effectively overlapped. I can find no specific reason why writing of dirty data pages during checkpoint should delay transaction processing, except if the flushing of pages is sufficiently slow as to

    affect the free page replenishment algorithm. In effect, if data page write is sufficiently slow, it may cause free list stalls, with all the negative performance implications thereof. To the SQL heavies: if I have missed some aspects of data page write, please let me know

    Log page write requests are not overlapped; an implicit or explicit transaction is not complete until the log pages reflecting that request are confirmed as written. The effect of poor write performance should not be cumulative, as it is in read performance; log write may require multiple physical writes, but they can be viewed essentially as either a single request, or multiple independent requests

    RAID 5 page writes require cache hits (not buffer hits) on the page to be written, and the parity page, and possibly all the pages in the parity set (there are some implementations that verify the parity before write). In mitigation, the page(s) can be retrieved in parallel, but the physical write is delayed by the retrieval process. It is likely that SQL log page writes have a very low cache hit rate (log writes into empty pages rather than previously used pages), the retrieval of the parity page, and possibly parity set, is what delays the physical write. The same delay of physical write is not true for RAID 10 implementations

    RAID 5 page writes require the page and the parity page to be physically written. In some implementations the writes are not strictly in parallel; recovery from failed writes requires a particular sequence of writes. Essentially the same is true for RAID 10 implementations

    In all SQL-supported implementations the modified page and parity page (where applicable) must be stored in non-volatile cache before the write can be posted as complete; regardless of where they were retrieved from. In almost all implementations I have come across the non-volatile cache is significantly smaller than the volatile cache. In most implementations the size of the non-volatile cache is not readily increased. I have yet to meet the non-volatile cache that is not frequently (constantly ?) overrun under load, thus delaying write completion until space is available in the non-volatile cache. In effect, for SQL, write buffering reduces the write time, but in practice it is still significantly dependent on the physical write time of the underlying pages

    What distinguishes RAID 5 write from RAID 10 write is that generally two entries are required in the non-volatile cache for RAID 5 as against a single entry for RAID 10, thus effectively doubling the size of the non-volatile cache, and thus reducing contention for space in that non-volatile cache. The non-volatile cache is in almost all RAID implementations a severely constrained resource. In addition, RAID 5 SQL log write time is significantly delayed by the retrieval of the parity page, which for SQL log is seldom a device cache hit almost regardless of the device cache size. This may significantly compound the contention for non-volatile cache space

    Thus, in my view, a large SQL buffer pool, a RAID 10 implementation, and as large a non-volatile cache as I can get are very desirable

    In my view SSDs do not change this equation, except that physical write rates to SSDs are significantly greater than to traditional disks. However PCIe-attached SSDs can be seen as a very large but relatively slow non-volatile device cache, which can benefit both read and write

  • I have worked with medium to large databases in SQL Server and Oracle for some time and have seen just about any combination of disks you can imagine for them. When our financial system went from RAID 5 to RAID 10, the disk queues shank 40% and IO improved more than imaginable. We got to be heroes once the move was finished. At the time it was an IO intensive 50GB database with about 800 users.

    Our POS system, which is Microsoft front to back, was crippled by an oversized RAID 5 array. We switched it to a RAID 10 array and did nothing else. The fix got reports from a 30 hour run to a six minute run. When we redistributed the files to improve the IOPS things got even better. Nothing changed but the disk configuration. The system works now like it should.

    As hardware has gotten better the gap has narrowed some. The question you have to ask is, "How important is my data?" I want our people to get their data instantly every time. I don't want anyone to have to wait for information they need to make our organization work.

  • I may get my official membership in BAARF revoked, but in Oracle SQL tests in spring '09 with a then-new midrange IBM POWER6 server and DS-series SANs connected by dual 4GB FC HBAs, I saw no discernable IO performance difference in read or write between RAID-5 and RAID-10 setups, even under unrealistically stressed conditions (tests were throttled solely by CPUs ability to generate enough IO). I forget the exact array sizes for each, but the SAN caching was truly effective in nullifying the write performance penalty. I should also mention that we have an SVC frontending the SANs, although I'm not sure how that would affect IO performance for the testing, unless it's cache is additively effective (when it's not shuffling volumes around).

    One other knock on R5 is the array rebuilding performance hit after a lost drive is replaced. We've lost multiple drives since then (guessing due to thermal stress after AC failure), albeit no more than one at a time in any array, and I've seen zero performance effect by either failure or replacement, at least on our production Oracle box.

    When relaying this to my former boss who is BAARFier than I am, he said that they came to the same damnable conclusion. I guess technology really does change. 🙂

    That being said, there is ZERO chance I'll approve of a standalone RAID-3/4/5/6/etc config for critical production systems. It's just the BAARF in me...

  • The popularity, I think, of RAID is derived from installation types around the world and the people that are setting up these new databases. Mom and Pop shops may not have the money to buy RAID 10 and RAID 5 may be pressing their budget but it is the best option for them. It also depends on type of database and volume of transactions as to whether it is an appropriate choice. Not every database needs to be on RAID 10 - it just doesn't make sense for the cost versus transaction volume in many cases.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Having read all of the posts to this point, I find it interesting the arguments for/against one RAID vs another.

    In my opinion and experinece the RAID rule of thumb has been data segments went on RAID 5 and T-Logs on RAID 1 or if able 10. In the case of SAN's, there was a lot more creativity but essentially the same.

    That being said, I feel that one should also consider the type of database, whether OLAP or OLTP as well as the size and number of users.

    In the case of OLTP you can place the data segment(s) on RAID 5, which will allow better read capabilities for those who are doing some reporting while the write protion (T-Log) is on RAID 10. This RAID is overall better for writing data. Then, as all know when a chackpoint occurs the data is written to the database files without much of an impact to the data-entry side.

    In larger or even busier databases, it may be of benefit to partition the database itself into seperate files placing these files onto and appropriate RAID array. ie Indexes for a certain table to a seperate file group on a seperate RIAD 5 array while the data for that table may be on it's own RAID 5 array.

    I personally like RAID 10 for my T-Logs as this array is actually more conducive to faster writing speeds.

    On larger databases, I also try to ensure that at least my RAID 5 and 10 is on a seperate channel on the controller, if not on seperate controllers all together.

    I also make sure that I have my tempdb on yet another RAID10 array in order to ensure that there is no contention there as well as on seperate controllers (most of my tempdb's ate approximately 100GB + and heavily used). I have found in several cases a need to partition these into seperate files as well and mve them to seperate drives.

    I also have a few small databases all on RAID 10.

    So I guess I would say that it isn't as simple as saying RAID 1 or 5 or 10 or 50 or ...

    Now IO extremes, such as reducing reports from 30 hrs to minutes - I suspect that there is a lot more there than a simple move of 1 database or database file.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • peter-757102 (1/10/2011)


    At my company we had two times now a disk in a RAID 5 broke and the array could not restore itself and had to use backups to continue working on another server. An identical issue with RAID 10, never cause any issues or significant downtime.

    I noticed this post and had to respond. In a properly configured RAID 5, a single disk failure would not cause a failure of the array. And even if there was a failure of the writing of the data to the replaced disk, no one would have known it because the array would have kept chugging along like nothing happened.

    So, I'm wondering if your aversion to RAID 5 is simply based on misunderstanding? What you've described cannot happen in the real world.

  • coldsteel2112 (4/6/2011)


    peter-757102 (1/10/2011)


    At my company we had two times now a disk in a RAID 5 broke and the array could not restore itself and had to use backups to continue working on another server. An identical issue with RAID 10, never cause any issues or significant downtime.

    I noticed this post and had to respond. In a properly configured RAID 5, a single disk failure would not cause a failure of the array. And even if there was a failure of the writing of the data to the replaced disk, no one would have known it because the array would have kept chugging along like nothing happened.

    So, I'm wondering if your aversion to RAID 5 is simply based on misunderstanding? What you've described cannot happen in the real world.

    Theory and practice are two entierliy different things.

    The RAID controller choked early during the repairs and never recovered, we had to send the disks to a specialised company to retrieve the data!

    And te time during repairs of a raid 5, is VERY risky for the validity of your data. At that point there is NO redundance and the disks themself cannot be read on their own either, so ANYTHING that can go wrong will cause data loss or corruption.

    And with todays big disks, the chance of something like that that happening is simply to large.

    To even considder using raid 5 today on large spinning disks is IMHO just on that basis alone a bad idea!

    Now, for SSD's there might by an exception as they are so fast and more reliable, rebuilding can be fast too.

    But even then, I strongly prefer the much simpler and transparent method of mirroring!

  • peter-757102 (4/7/2011)


    coldsteel2112 (4/6/2011)


    peter-757102 (1/10/2011)


    At my company we had two times now a disk in a RAID 5 broke and the array could not restore itself and had to use backups to continue working on another server. An identical issue with RAID 10, never cause any issues or significant downtime.

    I noticed this post and had to respond. In a properly configured RAID 5, a single disk failure would not cause a failure of the array. And even if there was a failure of the writing of the data to the replaced disk, no one would have known it because the array would have kept chugging along like nothing happened.

    So, I'm wondering if your aversion to RAID 5 is simply based on misunderstanding? What you've described cannot happen in the real world.

    Theory and practice are two entierliy different things.

    The RAID controller choked early during the repairs and never recovered, we had to send the disks to a specialised company to retrieve the data!

    And te time during repairs of a raid 5, is VERY risky for the validity of your data. At that point there is NO redundance and the disks themself cannot be read on their own either, so ANYTHING that can go wrong will cause data loss or corruption.

    And with todays big disks, the chance of something like that that happening is simply to large.

    To even considder using raid 5 today on large spinning disks is IMHO just on that basis alone a bad idea!

    Now, for SSD's there might by an exception as they are so fast and more reliable, rebuilding can be fast too.

    But even then, I strongly prefer the much simpler and transparent method of mirroring!

    So if the controller failed, then no matter what flavor of RAID you were running, you still would have had the same issue. Any disks in a failed array would still need a rebuild by the controller once the new disk was installed. So to blame RAID 5 on your issue is a bit unfounded.

    Not saying that RAID 5 is better or worse, but I've been using for the past 15 years on hundreds of servers (internally) and have never once seen one disk fail, along with another immediately after. Not saying that could never happen, just telling you my experience. As long as you have a spare configured and the controller set to high priority on the rebuild, you mitigate the length of time the array is in a vulnerable state.

    Now, with all that said, I agree 😛 I would much prefer RAID10 or if I have the capacity, RAID 50. Both posses a greater amount of fault tolerance then 5. So unless there is a space issue (only allowed 4 drives for instance) and you need the most out the capacity of those 4 drives (4 - 500gb drives: RAID 5: 1862GB vs. RAID 10: 931GB), there is no real reason to choose RAID 5.

    So, like all things in life, its about how much money are you willing to throw at it. 😉

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply