Trans Log Backups killing the server

  • Where to start?... I have a decent fairly new clustered server with 16gb ram and 4 multi/duel processors.  I'm doing trans log backups every 4 hours because when the trans log backup job runs, it sends the cpu's spiking and slows down the server a lot.  However recently I had to move a few more databases over to the server (we have another cluster that was having weird data corruption issues - but that's another story!) and the trans log backup is taking twice as long now and customers are calling to complain about it (the job runs at noon but we have customers in different time zones so they are not all out to lunch.)  The trans log is writing to the same striped disk array that has the data on it.  It's been suggested that disk contention could be an issue?  However we'd have to buy an expensive expansion device to put on the raid device to seperate out the trans logs and backup to a different physical disk.

    My server currently has about 530 databases on it  (2/3s of the databases are not heavily used.)  At 4pm when I run sp_who, I get 2941 spids. 

    The server is only 8 months old but have we maxed the thing out already?  We seemed to be doing pretty good until I moved those 20 databases over to it a few days ago.  I have searched but I can't seem to find anything useful on the internet about SQL maint job's causing the server to spike and slowdown extremely.

    Thanks for your input and suggestions!

    Paula

  • I am assumimng that you are in a high transaction enviroment and presumably large transaction logs when backed up. Why not back up the logs on a more frequent basis, every 15 mins or less. This still might cause some issues but the backup itself will be over in no time so hopefully no one evr notices.

  • Do you have access to any other disk?

    Your protection is not very good if you are backing up the transaction logs to the same disk arrays that the databases are running on. Also, you are exposing yourself by waiting until noon to back up the transaction logs. As our volume was growing I was considering going to every 30 minutes from every hour.

    If you have no availble disk that is not busy on this server, why don't you try to go to a Network disk. If you have 1 or 2 GB connectivity (that is not saturated), this would slow down the log dump (which is actually good for you if you have performance problems).

    Remember your objective is 24/7 computing. Attempting to maintain the server during "Quiet" times is quickly becoming a thing of the past.

    Rick Phillips

  • Thanks guys, interesting replies.  Yes, it's a high transaction environment.  I used to backup the trans logs every hour and never received complaints from the customers about it but some dev guy noticed the server spike every hour and thought it would be a better idea to do them only every 4 hours.  (I disagreed but got "over-ruled".  We talked a bit today about backing up to another location.  Currently I have a job that copies the last trans log backups over to a datacluster server. 

    Is there anything special that I'd have to do to instead just start copying the jobs to the datacluster?  Does it have to be mapped or created as a device?  I think I'm going to use our dev server to play around with this idea... because it sounds possibly promising.

    If you have any other thoughts or things I should look at, let me know.  Thanks!

  • What happens if you backup the log every 10 minutes?? maybe the spikes will be much shorter and it would cause any problems...

  • We used UNC network shares \\ServerName\LogDumps.

    The Share was an Compaq RAID-5 array sitting on our primary server (The production databases were on EMC Clarion SAN).

    The log dumps that came accross the 1 GB network ran 40% slower than the direct attached.

    My belief is that running the logs more often would be beneficial.  The logs would only spend a small time for most of the databases.  For the large volume databases the logs would be captured and move out quickly.

    Now I noticed that you have a large amount of memory (16 GB) on this server.  Are you aware of the AWE bug that was introduced by the latest SQL 2000 service pack?

      Rick Phillips

     

     

     

  • Why is it the discussion a dba has with a developer always seems to end in someone in management deciding that the dev guy is right. If a DBA was paid attention to then the preverbial would not be hitting the fan so often.

  • Care to elaborate on this?

  • Now I noticed that you have a large amount of memory (16 GB) on this server.  Are you aware of the AWE bug that was introduced by the latest SQL 2000 service pack? 

    Rick, No that one doesn't sound familiar.  I'll search for it but if you have the kb# please post it if you have a chance.  Thanks!

  • Search on these boards..

    sp4 awe bug should give you something like this :

    http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=5&messageid=182669

  • Thanks Rick!  I ran the script from kb899761 and the darn thing returned min=0, max=1, config_value=1, run_value=1.  Looks like I have AWE bug in full effect here!  Oh @#$%^@! 

    Wait... duh... I just had a "moment" there.  I'm still at sp3a for sql on my servers.  Good to know about this issue though because we have been talking about whether sp4 would help some of our problems.

  • Paula

    When you run your log dumps are they serially threaded.  If you try to run log dumps against all databases on your server at exactly 12:00 noon they will all try to steal the memory and I-O from each other.  This in turn would put your 4 CPU's into a thrashing condition pinning the CPU at 100%.

    Are you using the Native SQL Server dump program?  Most recently we were using SQL Litespeed (A Sql Server Central Sponser).  Litespeed was very effective at using any resources that were available.  The Litespeed product allows turning to control how much of the available resources are dedicated to the backups.

        Rick Phillips

        

     

  • Hi Rick,  Currently I'm using the native maint plan trans log backup.  That seems surprising to me that the included maint plan backup wouldn't serially run against each database.  Is that what you have seen in your experience?  One of the dev guys is also working with me on a scripted version of the maint plan which since it's a script will be serial. 

    (Note to the earlier person who posted:  I really admire our dev guys and get along with them... it's just natural that sometimes we're going to come at an issue from a different point of view and experience.  So no offense meant to the developers out there.)

    Thanks,

    Paula

  • We experienced many performance problems trying to use the maintenance plans.

    What we ended up doing was to have a SQL agent job that had a step for each database (you could generate this using the name column from the sysdatabases table). 

    I think that you will find that this will resolve your performance problems and then you can return to a schedule of hourly (or more frequent).

     

        Rick Phillips

  • Update:  So a bunch of sat yesterday with baited breath while watching the noon trans log backup. The cpu didn't really seem that bad - it had many spikes to 80% but it really only seemed to be causing most of the utilization on one of the 8 processors.  However I didn't notice that trying to do anything in Enterprise Manager was pretty much impossible during the job because even doing something that should be quick in Enterprise manager it hung at the hourglass for many minutes.

    I think today before we try anything else, I'm going to try to write that noon backup to our data cluster.  (Wanted to do it yesterday but instead spent most of the day on the phone with Microsoft about a pesky data corruption issue in another cluster.... maybe I should create a new post for that one?)

     

    Thanks,

    Paula

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply