Opportunity to separate trans logs & tempDB - Sorting out RAID options and best practices

  • We're about to add some more drives to our server and I could use some help deciphering all the conflicting information about what setup would serve us best. (long post - see "My Questions" near the end if you just want my core questions.)

    We're currently running 6 X 420gb disks in a RAID 5 with everything going to the same logical drive (not my design). We're running low on capacity, so we're going to add some more drives. I have 8 slots to work with.

    I see this as a chance to introduce a lot of best practices. I would like to take this opportunity to do the following:

    - move backups to their own drives

    - move tran logs to their own drive

    - maybe move tempdb to its own drive

    Profile of our load:

    - Several hundred connections making lots of little writes to a "RawLog" database (a few million writes per day)

    - One connection copying those writes to a "Report" database every minute

    - A handful of users running gigantic queries on the "Report" database

    - All on the same server (I know that was painful to read, now you know my pain)

    Basic Perf Metrics:

    - 150-200 Batch requests/sec

    - 4 dual core CPU's are averaging 10-30%

    - RAM is typically healthy with occasional dips (>99% buffer cache though value is inflated by all the little writes; page life expectancy in the thousands with occasional dip to 500s)

    - FAS (not SAN); Avg. Disk sec / Transfer baselines at 2ms, can get up to 15ms; Avg Disk Queue Len around 4, can get up to 60.

    What I've Read:

    All kinds of stuff both here, around the web, and in books. Summary of key conclusions:

    - SEPARATE LOCAL BACKUPS.

    Obvious benefits I think, but to restate, eliminates I/O contention during backup procedures, plus reduce fragmentation of the data disks since backup data is not mixed in. (This assumes a scenario where you are doing a local backup before copying it across the network.)

    - SEPARATE TRAN LOGS.

    Having transaction logs on their own disks is good so that sequential tran log writes don't have to contend with data read/lazy writes. Since you want optimal WRITE performance, RAID 1 or 10 is better than RAID 5. According to Kendal Van Dyke's article (http://www.kendalvandyke.com/2009/02/disk-performance-hands-on-part-5-raid.html) RAID 10 writes about twice as fast as RAID 5 given the same number of disks. RAID 1 and RAID 10 performance were about the same, so I conclude the RAID 1 writes would also be about twice as fast.

    Of course all this depends on the number of spindles. As Sean McCown pointed out, spindle count is a critical consideration as a 10 spindle RAID 5 will outperform a 2 spindle RAID 1 every time (http://www.sqlservercentral.com/Forums/Topic451170-146-1.aspx).

    - MULTIPLE DATA FILES on USER DB's - NOT A PERFORMANCE MOVE.

    According to Robert Dorr / Kalen Delaney, this can help performance during recovery, but SQL Server does not actually do more "parallelism" just because you're using multiple files. (http://blogs.msdn.com/b/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx) This is more of a flexibility/manageability move.

    - SEPARATE TEMPDB.

    Useful in environments that use a lot of temp tables where TEMPDB contention can become an issue. Multiple data files CAN be helpful here because the amount of create/drop activity can cause hotspot contention in the allocation tables, and having a separate data file per processor can alleviate that (provided the data files are the same size and you've followed certain TempDB optimization recommendations, etc).

    Again this depends on spindles available. As GrumpyOldDBA points out, you can actually introduce a new bottleneck if you move TempDB to an array with less spindles that your data array (http://www.sqlservercentral.com/Forums/Topic312669-65-1.aspx).

    Trying to apply what I've Read:

    One option I had considered:

    - Leave core data where it is (6 x 420gb drives on RAID5 + 1 hotspare)

    - 2 x 600gb drives on a RAID1 mirror for TRAN LOGS (I currently have 350gb of tran logs)

    - 2 x 50gb drives on a RAID1 mirror for TempDB data

    - 4 x 1TB drives on RAID5 for backup/archives

    Thoughts on that option / MY QUESTIONS:

    TRAN LOGS:

    - Will moving the tran logs from a 6-spindle RAID 5 to a 2-spindle RAID 1 actually slow writes down overall? That is, I assume a 2-spindle RAID 1 would only write about as fast as a 4-spindle RAID 5, so this might be a bad move.

    - If there are multiple database, does having the logs separated from the data really buy you much? Now the logs are having to contend with each other.

    - If you're running Replication, don't you need good READ performance against the logs?

    TEMPDB:

    - Will moving the TempDB from 6 spindles to 2 spindles create the bottleneck GrumpyOldDBA mentions?

    "RAWLOG" DB:

    - Should I try to get "RawLog" DB on different disks from "Report" DB? I'm running out of spindles here...

    I feel like I'm having a "cant-see-the-forest-for-the-trees" moment and would appreciate any advice or observations.

    Thanks!

  • Here's what I've decided to go with - a nice standard "medium" type solution (in Denny Cherry terms):

    - 4 new drives 600GB 15k rpm in a RAID 10 for Tran Logs

    - 4 new drives 1TB 7.2k rpm in a RAID 5 for backups/archives

    I think this is as exotic as I have room for. My data is creeping out of "medium" territory so I'm tempted to do something more complicated, but I just don't have enough spindles for it to have the appropriate impact.

    Comments still welcome.

  • will s (2/2/2011)


    Thoughts on that option / MY QUESTIONS:

    TRAN LOGS:

    - Will moving the tran logs from a 6-spindle RAID 5 to a 2-spindle RAID 1 actually slow writes down overall? That is, I assume a 2-spindle RAID 1 would only write about as fast as a 4-spindle RAID 5, so this might be a bad move.

    - If there are multiple database, does having the logs separated from the data really buy you much? Now the logs are having to contend with each other.

    - If you're running Replication, don't you need good READ performance against the logs?

    TEMPDB:

    - Will moving the TempDB from 6 spindles to 2 spindles create the bottleneck GrumpyOldDBA mentions?

    "RAWLOG" DB:

    - Should I try to get "RawLog" DB on different disks from "Report" DB? I'm running out of spindles here...

    I feel like I'm having a "cant-see-the-forest-for-the-trees" moment and would appreciate any advice or observations.

    Thanks!

    First comment: Very nice summary of previous research Will, and dead on accurate. I might point people with RAID questions to this thread, I liked it that much. 🙂

    So, 6 Spindle Raid5 vs. 2 Spindle Raid 1. Offhand, I'd say you're going to get the same I/O throughput, but you're going to remove double write contention. What I mean by that is you won't be trying to write to the log while you're trying to write to the database.

    You're correct about the logs contending with each other, but only to a certain point. Remember, the log writes contiguously through a specific process, and only during writes. You're not doing many large writes, but a thousand small ones. Best practice is, yes, each log file exists on its own physical spindle. Reality and cost usually prevents this. Getting these logs to their own physical spindle in general though will help anyway, because of the previously mentioned double write contention. You don't have write cache in an FAS (I think...).

    Replication: If you're running Transactional Replication... yes. You'll also want good read for backups, too. Logs are primarily write driven, however, and should be concentrated on from that perspective, unless you're looking at 10+ subscribers.

    Tempdb: This is a contention point, and can only be explained through statistics and you'd need to analyze your system. Because you have large reports coming out of your system, you have to figure out where they're sorting. If they're straight data-dumps to the reporting platform which then massages the data, you're fine, don't sweat it. If there's ORDER BY clauses on these queries, you're hammering your TempDB.

    Now, if TempDB is in the same place it's reading the historical data from, you might have issues. The reason you typically want to get TempDB off to its own spindles is to remove the simultaneous contention between the reads from the database and the read/write as you manipulate secondary #Temp and @temp tables and the sorting it'll do for result sets there.

    So, I haven't answered your question, but simply given you more to think about. In this case, I'm not sure there's a lot you can do. I'd leave it on the main spindle set. You've only got 8 spindles to work with.

    For your databases, no, I wouldn't get them onto separate spindles in this case. You've got enough other things that would do you better. But I *would* make sure you size them at the beginning for the next year or two of growth so you have contiguous space assigned to them.

    With 8 spindles, I'd usually do something like this:

    - 2xspindle Raid 1 OS/Swapfile/Local Backups. Transfer local backups to network storage after backups complete.

    - 2xspindle Raid 1 Log file. If I was read heavy instead of write heavy, I *might* put tempdb here only after some experimentation.

    - 4xspindle RAID 5 (for space). Main MDF storage.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks Craig, that actually helps validate the direction I'm going. I appreciate your thorough feedback.

    One more question on log contention: I basically have an OLTP database doing lots of little writes per second, and then an "OLAP" style database on the same server repeating those same writes, but at a minutely interval. On the OLTP database under heaviest load I might see 25,000 inserts and 1,000 updates over the course of a minute. Then a minute later I re-insert all of those in one fell swoop to the OLAP database. It's about 1MB worth of data.

    I assume since it is copied with a single mass insert that the tran log write to the OLAP database would be considered a "long" write comparatively since it's all 25,000 records at once. Long writes as I understand it actually perform a little better on RAID 5, or at least are not twice as bad as RAID 10 (though I'm a little muddy as to why). So I'm considering leaving the OLAP database logs on the original 6 spindle RAID 5. This way the continuous little writes aren't having to compete with the one long write once per minute on the new Log array. Does that theory hold water?

    (By the way to clarify I actually have 15 total spindles to work with, 7 are in use and I have 8 more that I can add.)

    Thanks again.

  • will s (2/3/2011)


    One more question on log contention: I basically have an OLTP database doing lots of little writes per second, and then an "OLAP" style database on the same server repeating those same writes, but at a minutely interval. On the OLTP database under heaviest load I might see 25,000 inserts and 1,000 updates over the course of a minute. Then a minute later I re-insert all of those in one fell swoop to the OLAP database. It's about 1MB worth of data.

    Ah, I see where the confusion here is. The reason you want logs moved off to a different spindle is because as you write to the database you simultaneously write to the log file. If they're on the same physical spindles your disk head is flying around like an Irish Sword Dancer.

    With that in your mind...

    I assume since it is copied with a single mass insert that the tran log write to the OLAP database would be considered a "long" write comparatively since it's all 25,000 records at once. Long writes as I understand it actually perform a little better on RAID 5, or at least are not twice as bad as RAID 10 (though I'm a little muddy as to why).

    This depends ENTIRELY on hardware, raid controllers, and the like. Do not take this as gospel until you've tested it yourself on your equipment.

    So I'm considering leaving the OLAP database logs on the original 6 spindle RAID 5. This way the continuous little writes aren't having to compete with the one long write once per minute on the new Log array. Does that theory hold water?

    Unfortunately, no. You want your log to not be a problem, or borrowing time, from the disk head while it tries to do its 'long write'. Get your logs off the same spindle, long or short. When this occurs, though, expect a little 'stuttering' (at the millisecond level) from your OLTP since it's sharing log spindles with the OLAP.

    (By the way to clarify I actually have 15 total spindles to work with, 7 are in use and I have 8 more that I can add.)

    Thanks again.

    Oooooooohhhh... 🙂 Nummy. That's a LOT more room to work in and can certainly help you out. However... how many unique physical setups can your FAS's RAID Controller handle? It's been a bit since I dealt directly with a FAS so hopefully someone will correct me if I'm wrong here. There's only a certain number of 'channels' that a Controller can have, so there's only so many simultaneous physical LUNs you can hit at once.

    Hmmm, if I can find a few hours later I think I'll go exploring the web and remind myself about that construction and equipment.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply