Netapp SnapManager for SQL

  • I'm sorry if some things weren't clear. Let me explain a little. I love the snapshot technology - especially for large databases. The backup and restores can be lightening fast. There is also the option of using cloning where you can setup a copy of a database without taking up any additional space on the server (though it is still taking up space on the backend datastore). In addition, this database could be 2 TB in size but could be presented almost instantaneously.

    My point though is that even though the snapshot technology is fantastic the SMSQL tool was built by people who know storage but don't have any clue what it is like to be a DBA managing a large number of SQL Servers. NetApp isn't the only culprit. Others include Avamar, SyncSort and more. Storage teams love these tools because the tools leverage storage technology. But these tools are a nightmare for DBAs to manage and use. It really is a square peg in a round hole.

    Case in point. As I've mentioned before SMSQL gets all its configuration from flat files (I wish I could tell you where these are located but they should be somewhere in the initial install directory). Now, if you look at every database management tool (Idera, Quest, Redgate) they all use database repositories. Why? Because they know its fast, good design and allows for each DBA to install a client on their local workstation for management. How do they know this? They have DBAs working on the tools. NetApp knows this because I've talk to their product development team. Will they make a change? Probably not. Why? Because they're focused on storage not DBAs and it probably isn't cost-effective for them to rewrite the application to please DBAs.

    I also had a conversation with their product development team about adding and removing databases and updating the configuration. Initially, they were very defensive but eventually admitted it was a limitation and they would work on it.

    So, in summary, because the snapshot technology is so good I would continue to recommend it if you are only managing 10 or so systems. Any more than that and you'll want to reconsider. Ideally, I would only recommend it for systems that have 1 or 2 very large databases (> 500 GB), systems that do not change frequently (add or remove databases), and systems that can have dedicated LUNS for the datafiles (no application files on the same LUNs). Quiescing is required which is why your system databases have to be separate and can't be part of a snapshot but quiescing also requires a lot of configuration planning up front.

    I've talked extensively about this in my mssqltips posts: http://www.mssqltips.com/sqlserverauthor/55/scott-shaw/

    Again, sorry for any confusions. I really hope NetApp gets their act together or maybe one of the other vendors like Idera can work with NetApp to implement the snapshot technology into their backup tools. That would be cool.

    Scott

  • We've had SnapManager in place now for several months as our primary backup solution and I can't say I'm overly thrilled with what we've witnessed. In large part because it's owned/configured/managed by a team of network engineers at another location so it's kind of a black box process for myself and our local engineer. As such, I'd like to run some things by people who know better than I as we put together a plan forward.

    1. Currently SnapManager creates a snapshot (separate data and log volumes) of our database server. That snapshot is mirrored to a standby server for DR purposes. That mirror is then cloned to another server for restore for consistency checks and internal reporting purposes. Frequently the clone step fails because the data and log volume snapshots are out of sync. If the clone snapshots can be out of sync can the same be true of the mirror? I'd hate to have to fall back on the DR copy to find it's out of sync.

    2. Removing (and adding) DBs without modifying SnapManager causes issues. We were told whenever we do this that SnapManager has to re-configured. Knowing that then it's not truly volume replication but file replication within a specific volume. Sound about right?

    3. Our previous backup policy was taking a full DB backup on the weekend and differentials/logs during the day/week. Since implementing SnapManager, and because I don't trust a process I'm not in control of, we've switched to a daily full backup because SnapManager updates the last backup date and thus invalidated the (native) full backup. Anybody got a workaround for this?

    Thanks in advance for any replies and if I'm way off base here please let me know. This is a technology I'm not overly familiar with, have limited if any control over and which was "strongly suggested" for us to implement.

    -Nate

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • Nate, I'll try to answer each question the best I can:

    1. Currently SnapManager creates a snapshot (separate data and log volumes) of our database server. That snapshot is mirrored to a standby server for DR purposes. That mirror is then cloned to another server for restore for consistency checks and internal reporting purposes. Frequently the clone step fails because the data and log volume snapshots are out of sync. If the clone snapshots can be out of sync can the same be true of the mirror? I'd hate to have to fall back on the DR copy to find it's out of sync.

    --I'm not sure about this. We could never get cloning to work. My guess is that if the clone is a clone of the mirror then yes, it is possible the mirror could be out of sync. Definitely something worth validating.

    2. Removing (and adding) DBs without modifying SnapManager causes issues. We were told whenever we do this that SnapManager has to re-configured. Knowing that then it's not truly volume replication but file replication within a specific volume. Sound about right?

    --The SMSQL configuration is separate from SSMS. There is absolutely no integration except it runs a job creation script at the end of configuration. This is one reason why SMSQL is a poor choice for consolidated SQL environments that have databases frequently added or dropped. SMSQL works at the volume level. It is a storage technology - not a database technology.

    3. Our previous backup policy was taking a full DB backup on the weekend and differentials/logs during the day/week. Since implementing SnapManager, and because I don't trust a process I'm not in control of, we've switched to a daily full backup because SnapManager updates the last backup date and thus invalidated the (native) full backup. Anybody got a workaround for this?

    --I'm not clear how it invalidates the full backup. I can see it breaking the log chain. Not sure. Mixing native and SMSQL is probably not a good idea simply for scheduling reasons though I completely sympathize with your concerns.

    Thanks in advance for any replies and if I'm way off base here please let me know. This is a technology I'm not overly familiar with, have limited if any control over and which was "strongly suggested" for us to implement.

    -- I've worked at a company with this problem and also consulted with companies with the same problem. Your story is familiar and common to most environments implementing SMSQL. It is usually a top down implementation. Because of major investments in the NetApp storage, companies feel they have to leverage SMSQL (NetApp also sells it to them as a "feature") in order to be consistent with their storage solution. This is the wrong approach and will not scale. It will also put database recovery at risk. I've also seen companies have employees leave because they insisted on using SMSQL. The core problem I've found is a lack of understanding by storage teams and c-level leadership of how databases function. Many times they see no difference between database files and file server.

    Good luck and let me know if there is any other way I can help.

    Scott

    [/quote]

  • Scott, thanks for responding. I think our case for moving back to native backups gained some traction last night: network performance issues prompted the network team to turn off mirroring. We've now gone 12+ hours with no DR backup. Awesome.

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • Nate - I ran into the same issue with adding / removing databases and posted a thread on the NetApp forums. I did receive a couple responses on how to work around this. I haven't implemented them but you can review the thread and see if you find anything helpful there.

    https://communities.netapp.com/thread/15539

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Thanks David. I'll check it out.

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • The fun with SnapManager continues. It's been days now mirroring/reseeding a VLDB (5TB). Yesterday during the mirror, we ran out of space on the log volume snap partition which in turn took all the DBs offline. From what I can tell given the fact that two of my DBs came back suspect, it's not a graceful shutdown.

    From SQL Log:

    Error: 17207, Severity: 16, State: 1.

    FileMgr::StartLogFiles: Operating system error 2(The system cannot find the file specified.) occurred while creating or opening file 'E:\SQL_LOG\DBName.ldf'. Diagnose and correct the operating system error, and retry the operation.

    File activation failure. The physical file name "E:\SQL_LOG\DBName.ldf" may be incorrect.

    The log cannot be rebuilt because there were open transactions/users when the database was shutdown, no checkpoint occurred to the database, or the database was read-only. This error could occur if the transaction log file was manually deleted or lost due to a hardware or environment failure.

    Luckily this was to an internal dev/research db and the loss of data is tolerable. As such, I was able to bring the DBs online by detaching and re-attaching just the .mdf files.

    This is not the first time we've had issues with snap partitions filling up and DBs going offline. Do to our recent experience, this is a risk we're not willing to tolerate and are moving back to native backups and copy jobs.

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • Did you set your retention low enough in the SnapManager Configuration utility?

    One note here, but from what I understand SnapManager for SQL Server is on the way out and they are using either CommVaults snapshot solution or something else rebranded. I don't have all the details but some of the issues may go away with those changes. We'll see....

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • David, unfortunately the SnapManager process is owned and maintained by a separate group (storage/network) in another location. All things considered, I should be able to put together a nice little case study of how not to implement a back up and recovery plan after this.

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • As we stumble our way into using SnapManager, I read this article by a Netapp technician/blogger who says to avoid running index maintenance during "snaps."

    https://communities.netapp.com/blogs/abhishek/2012/06/05/things-to-keep-in-mind-when-running-index-optimization

    So my next question, since we're already "snapping" the data files over to a DR Netapp filer, is whether snapmanager "backups" use the same technology. We know from previous testing that sql snapmanager backups do write to the sql log, including information on the IO freeze. So since we're already snapping the data files (I'm told) regularly using what I think is called Snap Mirror, if they use the same technology as the "backups" then perhaps reindexing is not affected.

  • That is the same technology.

    We have halted our snapshot processes during some of our index maintenance but not all. It should work unaffected. If nothing else I would decrease frequency if you are doing very frequent snapshots. Pretty easy to do if you have the snapshots being executed via a SQL job as you can add a schedule that is specific for your maintenance window.

    Hope this helps.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Thanks David. I'd like to hear more about this idea that Netapp may abandon snapmanager. Last Sunday we successfully took a final backup of our databases, shut down production and brought up the DR sql server. Our storage guy, who is still pretty new to all of this, had to manually point and click to bring the luns online. Then I had to attach all of the databases myself -- something I thought snapmanager would do. Not a big deal since I scripted the attachment process.

    Next he needs to learn powershell and how to script bringing up the DR filer so that can happen quicker. One 3 TB "piece of disk real estate" containing scanned images didn't come online even though we thought it was being snapped along with the data files.

    One of the most difficult/frustrating things is the use of terms like backup, restore etc since they don't necessarily mean the same things in the storage world. The backups do seem to be "backups" since they write to the sql log, but given that they occur in under two minutes for roughly 3 TB of databases they are only capturing the "deltas" and not in any way the same as a sql database backup which would take hours.

  • It is true that NetApp is abandoning SnapManager in favor of CommVault's Intellisnap. We are in the process of migrating to the same. Planning stages at present. I'll certainly get back with feedback once I have something intelligent to add. 🙂

    As to the snaps being backups, they kind of are. You are correct that they are snapping deltas at the storage level but they put that all together to be capable of actually restoring that to SQL in the same sense as we would understand that and even allow for KEEP_REPLICATION option to be passed in as a valid input parameter. Not sure about all the details as I haven't gone too far into it but I have tested that functionality as it is a critical piece for us. CV's Intellisnap will allow for the same functionality.

    I do know that the CV product Intellisnap is recognized by multiple storage vendors including NetApp, Hitachi and EMC. I'm sure the other big names are on the list as well. If that is the direction that you are heading then it seems like you will be in good hands. Not sure if NetApp is recommending anything different at this point though.

    Should be fun.... 🙂

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • It would be interesting to see something in black on white on the fact that intellisnap will replace snapmanager. Especially since we've been agitating for some training on snapmanger, and version 6 of snapmanager was just released. We've used commvault for tape backups for years. but no longer write production sql backups to tape.

  • Agreed. I don't have anything but vendor communication. Hopefully the post you put up on the NetApp forum will get some feedback.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

Viewing 15 posts - 31 through 45 (of 53 total)

You must be logged in to reply to this topic. Login to reply