Clustering and SQL recovery mode

  • I've gotten myself into an odd situation. A few months after an upgrade from SQL 2000 to 2005 (which I was NOT involved with), the customer comes back to us complaining that most of the SQL agent jobs aren't running or are failing. So I go in there and enable the disabled ones and fix the ones that were broken. I also notice that (1) all databases are in Simple recovery mode [we typically recommend full recovery for our application] (2) Growth by 10% on all dbs [we recommend setting it to a particular value because growth by % can get you in trouble if you don't watch it all the time] (3) most recent backup was from JANUARY 2010 (4) the Backup jobs were "off" (5) the disabled backup jobs were set to backup to S: drive (5) All data and log files are on S: drive. (6) S: drive doesn't have much space available for backups. They have a SQL cluster and I was logged into the SQL cluster (rather than into one node or the other).

    They respond with :

    "Having separate drives for data and log files will not gain any performance benefits as any drive on the cluster is on the same network storage array. Even though the server sees the s: drive as a local disk, it actually exists across the network on our SAN (with iSCSI protocol)... we won't be growing S for backups - production backups are done at the SAN level - the D drive on SQL server is for AD-hoc backups - we do not write backups to expensive network storage, but a production restored would come from the SAN anyway....

    "Databases are backed up at the SAN level every 4 hours and kept for 4 days. They are also replicated to another site.

    "Because we back up at the SAN level and not at the server level, simple recovery mode is a requirement. There is a job that runs nightly that does a traditional SQL backup to network storage."

    I'm thinking that these guys are wrong, but not sure how to tell them they're nuts. From what the guy said, it sounds like at best, they can recover to data that's 4 hours old! Yet we have so many transactions going on in our application, that most customers have nightly backups and back up tran logs every 15 minutes.

    My question : am I just not understanding SAN backup scenarios ? or is this guy wrong?

  • LadyRuna (10/8/2010)


    I've gotten myself into an odd situation. A few months after an upgrade from SQL 2000 to 2005 (which I was NOT involved with), the customer comes back to us complaining that most of the SQL agent jobs aren't running or are failing. So I go in there and enable the disabled ones and fix the ones that were broken. I also notice that (1) all databases are in Simple recovery mode [we typically recommend full recovery for our application] (2) Growth by 10% on all dbs [we recommend setting it to a particular value because growth by % can get you in trouble if you don't watch it all the time] (3) most recent backup was from JANUARY 2010 (4) the Backup jobs were "off" (5) the disabled backup jobs were set to backup to S: drive (5) All data and log files are on S: drive. (6) S: drive doesn't have much space available for backups. They have a SQL cluster and I was logged into the SQL cluster (rather than into one node or the other).

    They respond with :

    "Having separate drives for data and log files will not gain any performance benefits as any drive on the cluster is on the same network storage array. Even though the server sees the s: drive as a local disk, it actually exists across the network on our SAN (with iSCSI protocol)... we won't be growing S for backups - production backups are done at the SAN level - the D drive on SQL server is for AD-hoc backups - we do not write backups to expensive network storage, but a production restored would come from the SAN anyway....

    "Databases are backed up at the SAN level every 4 hours and kept for 4 days. They are also replicated to another site.

    "Because we back up at the SAN level and not at the server level, simple recovery mode is a requirement. There is a job that runs nightly that does a traditional SQL backup to network storage."

    I'm thinking that these guys are wrong, but not sure how to tell them they're nuts. From what the guy said, it sounds like at best, they can recover to data that's 4 hours old! Yet we have so many transactions going on in our application, that most customers have nightly backups and back up tran logs every 15 minutes.

    My question : am I just not understanding SAN backup scenarios ? or is this guy wrong?

    Tell them you find the strategy interesting and you would like to learn more about it. Ask for documentation, a link to the vendor, anything that helps you understand how that backup strategy works. 🙂

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Well..there's a few things going on here. They're correct on a few points...and incorrect on quite a few.

    First and foremost, if they decided against taking txn log backups, then Simple is fine. If they decide one backup every 4 hours is sufficient and they're OK with losing up to that amount of data, that's their choice. I would ask if they've ever done a test recovery to see if their san-level backups even work.

    As far as performance goes..if it's all on the same raid group on the SAN, then there really isn't a performance difference by splitting up into logical drives and separating the log and data files. This is a poor practice on their end, but there is probably nothing you can do to change that.

  • Check with them that their SAN backup solution is supported by SQL (typically will hook into SQL to freeze the IOs during the snapshot). Otherwise when it shapshots the datab and log file there's a good chance that the resulting copy will not be usable.

    Ask if you they can test a restore from their backup, because you need to be certain that it's all working fine.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks, guys. That has reassured me that my statement weren't totally off-base.

    I will ask the customer about whether he's ever tested restoring from the SAN. I bet he hasn't tried it yet. 🙂

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply