Backups, Backups, Backups.......

  • I work as a DBA for a medium size municipal government organization. I work together with another DBA and together we support all Oracle and SQL Server environments.
    We are in the application support group. I started less than 1 year ago at this organization but have been working as a DBA for many years.

    Approximately 3 or 4 years ago they implemented a backup solution from HP that was intended to be a single solution for all our backups, servers file systems and databases.
    This would allow one group - the infrastructure group (sysadmins specifically) to manage all the backups and only one group would need to be on call. So we as DBA's are not currently on call.
    We have an appliance for backups called HP Storeonce. It comes with an interface that we as DBA's do not have access to.  We don't control the scheduling of these backups and currently don't have any influence on it at all. We can't align the backups with any other maintenance and we can't perform database restores with these backups.

    I noticed when I started that we generally run full backups once per day. Some databases were not being backed up, some were in Full Recovery without transaction log backups and others had
    transaction log backups running immediately after the full backups to prevent those pesky transaction logs from growing too much.  Recovery would generally be limited to up to 24 hours ago in the event of a disaster.  The business units we support would have no understanding of our backup processes and the implications in the event of a disaster.

    Since I've been here I've never seen one of these backups from StoreOnce be restored. As DBA's, we do run backups for our purposes, generally to local disk. These are mostly used for refreshing
    DEV and QA environments, testing etc. Since I started I've been advocating for us to take over responsibility for the backups. We can do this by presenting a share created off the StoreOnce applicance -
    this would have the benefit of deduplication that the appliance offers and we could use native database tools to manage the backups. We could simply present a share that each database server could access,
    something like \\Backup_Share\SQL_Backups\Server\FULL (another folder for transaction log backups). We would then use Ola Hallengrens maintenace solution to script the backups.

    I've been pushing since I've started and at every turn there has been one reason or another why it wouldn't work. But with some persistence we may be close to getting what we want.
    I've been given the following restrictions;

    -    AD to be used for authentication.
       This is a non-issue, the SQL Server Service account would need permission to the share.
    -    Each DB server will have its own share/directory and will not have access to any of the other servers’ backup shares/directories.
       This would be an issue for me, if we wanted to refresh the DEV database from production how would this happen without the DEV server having permission to the production backups. I just don't see why this is necessary.
    -    Each DB server will use a server specific service account for connecting to the dedicated share/directory. This account will have a logon restriction that will only allow it to logon from the server it is assigned to.
       Not an issue, other than they could only access their specific backups.
      
    The infrastructure manager made reference to the server getting exposed to randsomeware, IE it is infected with randsomeware and if the server had access to all backups they would all be at risk. Not sure how this could happen without the AD account being comprimised.
    Does this seem reasonable? How does everyone else handle their backups?
    Any thoughts and direction would be greatly appreciated!

  • Overall, this seems reasonable to me. I like separation of each server/instance to its own account and its own backups. Anyone logging into a server could run IE, and someone will. I'd bet someone has. Humans are often the weak link, so limiting access is always good. There are holes and bugs, and as we've seen the last couple weeks, potential hardware issues with SQL Server.

    That being said. Development shouldn't be restoring production directly.   I know you might not have PII or sensitive data, but you're creating an attack vector for data to be lost/released. Or is Dev secured tightly? I assume you're running DBCC in production? If not you need to or restore this elsewhere and run it there. For refreshing dev, you can implement a process to copy the backup file, though I'd want some obfuscation/masking to take place before development gets the data.

  • I'm in a similar boat to you, in that I have no control over the setup / backup of my databases except by contacting the person who admins our "enterprise" backup software.  Thankfully I could tell them what sort of schedules to set, but I'm trying to get the backups under my control, at least for the databases.

    As for your infrastructure managers concerns, you can further reduce the potential risk footprint from ransomware by having the AD team create and deploy a GPO to block the execution of Internet Explorer / Edge / Firefox / Chrome / etc on servers.  We have such a policy here in place and frankly, it's not as big an inconvenience as you might expect.

    I would also suggest tossing this link to your IM as well, to help explain *why* you want to control the database backups:
    https://spaghettidba.com/2017/09/13/expensive-enterprise-backup-tools-a-survival-guide/
    Some of it, he'll probably look at and scoff at.  Some of it, though, might make him think more about giving you said control.  But most important, work *with* him, rather than standing firm on what you want.  Not being able to easily restore prod to QA / Dev environments?  Work with him on that, maybe get an extra share that you can run a "one-off" backup to from production that QA / Dev can read from, that is not used for anything else (and possibly even with a policy or scheduled task to empty out said location daily.)
    But, also, as Steve said, be careful about putting production data in dev environments.

  • I guess my recommendation would be... to educate the people involved as to what RPO and RTO actually means and what the impact to the business would be for (for example) the rather miserable 24 hour RPO that they currently seem to have and the fact that there might be no RTO possible unless someone actually verifies that the backups are restorable.

    As for Dev and Staging boxes, a lot of people are working their butts off to get code developed and tested.  For that reason, I treat both boxes just like the Production box and have an even tighter RPO and RTO because things happen and people DO accidently delete stuff (data and or objects) in the "heat of battle".  To have lost a two day test setup would be stupid when something is so easy to recover from simply by doing the right thing with backups and testing restores.  And, yes... I restore my two largest databases every night right after the backups complete.  Granted those two databases only take up about 2TB of space but it's essential that we be able to quickly restore if all the other fancy methods that the infrastructure team has (very well) put together.  It also gives me a regular measurement of precisely what the RTO will be in case of such an event.  Even my 2TB telephone system (we're required to keep call recordings forever) has been setup for an otherwise insane "get back in business time" of only 6 minutes and a total RTO of only a few hours (for the remaining 8 years of calls) because of the way we have backups constructed, which is also a strong justification that "one size" DOES NOT fit all.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Steve Jones - SSC Editor - Wednesday, January 17, 2018 7:47 AM

    Overall, this seems reasonable to me. I like separation of each server/instance to its own account and its own backups. Anyone logging into a server could run IE, and someone will. I'd bet someone has. Humans are often the weak link, so limiting access is always good. There are holes and bugs, and as we've seen the last couple weeks, potential hardware issues with SQL Server.

    That being said. Development shouldn't be restoring production directly.   I know you might not have PII or sensitive data, but you're creating an attack vector for data to be lost/released. Or is Dev secured tightly? I assume you're running DBCC in production? If not you need to or restore this elsewhere and run it there. For refreshing dev, you can implement a process to copy the backup file, though I'd want some obfuscation/masking to take place before development gets the data.

    Thanks Steve, we don't have large requirement for masking.  Most the access to DEV matches production at least for the instances that this applies.  I would have thought that restricting to one account for Prod, QA and DEV would be sufficient.  That way we could automate the restores.  Under to proposed configuration I wouldn't be able to complete a restore to DEV myself without a sysadmin allowing the copy or copying the file.  I'm just not sure exactly what this configuration protects against - these servers are restricted and have no internet access and are behind a corporate and windows firewall.

  • jasona.work - Wednesday, January 17, 2018 8:42 AM

    I'm in a similar boat to you, in that I have no control over the setup / backup of my databases except by contacting the person who admins our "enterprise" backup software.  Thankfully I could tell them what sort of schedules to set, but I'm trying to get the backups under my control, at least for the databases.

    As for your infrastructure managers concerns, you can further reduce the potential risk footprint from ransomware by having the AD team create and deploy a GPO to block the execution of Internet Explorer / Edge / Firefox / Chrome / etc on servers.  We have such a policy here in place and frankly, it's not as big an inconvenience as you might expect.

    I would also suggest tossing this link to your IM as well, to help explain *why* you want to control the database backups:
    https://spaghettidba.com/2017/09/13/expensive-enterprise-backup-tools-a-survival-guide/
    Some of it, he'll probably look at and scoff at.  Some of it, though, might make him think more about giving you said control.  But most important, work *with* him, rather than standing firm on what you want.  Not being able to easily restore prod to QA / Dev environments?  Work with him on that, maybe get an extra share that you can run a "one-off" backup to from production that QA / Dev can read from, that is not used for anything else (and possibly even with a policy or scheduled task to empty out said location daily.)
    But, also, as Steve said, be careful about putting production data in dev environments.

    Thanks!  The link is great,  this is exactly what I have been struggling with.  Data Domain and HP StoreOnce have the ability to create a share off the applicance.  The share leverages the dedup of the appliance.  This way you can use native tools to write backups.  This is what I am proposing.  I just feel as if the implementation of the share is overly restrictive.

  • Jeff Moden - Wednesday, January 17, 2018 10:41 AM

    I guess my recommendation would be... to educate the people involved as to what RPO and RTO actually means and what the impact to the business would be for (for example) the rather miserable 24 hour RPO that they currently seem to have and the fact that there might be no RTO possible unless someone actually verifies that the backups are restorable.

    As for Dev and Staging boxes, a lot of people are working their butts off to get code developed and tested.  For that reason, I treat both boxes just like the Production box and have an even tighter RPO and RTO because things happen and people DO accidently delete stuff (data and or objects) in the "heat of battle".  To have lost a two day test setup would be stupid when something is so easy to recover from simply by doing the right thing with backups and testing restores.  And, yes... I restore my two largest databases every night right after the backups complete.  Granted those two databases only take up about 2TB of space but it's essential that we be able to quickly restore if all the other fancy methods that the infrastructure team has (very well) put together.  It also gives me a regular measurement of precisely what the RTO will be in case of such an event.  Even my 2TB telephone system (we're required to keep call recordings forever) has been setup for an otherwise insane "get back in business time" of only 6 minutes and a total RTO of only a few hours (for the remaining 8 years of calls) because of the way we have backups constructed, which is also a strong justification that "one size" DOES NOT fit all.

    Thanks for the reply Jeff.  In our organization I don't generally communicate directly with the business units.  I think if I contacted them directly it wouldn't go over very well.  I think I'm close to having control of backups but the share restrictions just make restores difficult.  And in my mind the restore is the most important part.  I think I mentioned that in the 7 months I've worked here there has not been 1 restore using our enterprise backup appliance.

  • I know this post is a couple of months old now but I am in the process of implementing Ole Hallengren's maintenance solution to utilise the StoreOnce appliances my Company purchased. There is a Catalyst Plug-In for SQL Server which is installed on each server that you want to backup, allowing you to backup straight to the StoreOnce appliance (without the need for a share as such). You can then run an executable from a CLI window or install an extended stored procedure (HPE won't be moving away from this until late 2019) which can be used from within Management Studio. I have had no issues restoring from StoreOnce and have been able to automate some of those restores although I haven't considered it for replacing the logshipping we already have in place - yet. If you haven't been able to resolve everything yet, please ask and I will see if I have encountered anything similiar.

  • jackimo - Friday, January 19, 2018 12:09 PM

    Jeff Moden - Wednesday, January 17, 2018 10:41 AM

    I guess my recommendation would be... to educate the people involved as to what RPO and RTO actually means and what the impact to the business would be for (for example) the rather miserable 24 hour RPO that they currently seem to have and the fact that there might be no RTO possible unless someone actually verifies that the backups are restorable.

    As for Dev and Staging boxes, a lot of people are working their butts off to get code developed and tested.  For that reason, I treat both boxes just like the Production box and have an even tighter RPO and RTO because things happen and people DO accidently delete stuff (data and or objects) in the "heat of battle".  To have lost a two day test setup would be stupid when something is so easy to recover from simply by doing the right thing with backups and testing restores.  And, yes... I restore my two largest databases every night right after the backups complete.  Granted those two databases only take up about 2TB of space but it's essential that we be able to quickly restore if all the other fancy methods that the infrastructure team has (very well) put together.  It also gives me a regular measurement of precisely what the RTO will be in case of such an event.  Even my 2TB telephone system (we're required to keep call recordings forever) has been setup for an otherwise insane "get back in business time" of only 6 minutes and a total RTO of only a few hours (for the remaining 8 years of calls) because of the way we have backups constructed, which is also a strong justification that "one size" DOES NOT fit all.

    Thanks for the reply Jeff.  In our organization I don't generally communicate directly with the business units.  I think if I contacted them directly it wouldn't go over very well.  I think I'm close to having control of backups but the share restrictions just make restores difficult.  And in my mind the restore is the most important part.  I think I mentioned that in the 7 months I've worked here there has not been 1 restore using our enterprise backup appliance.

    Yowch.  I don't envy that position at all.  Hang in there and keep trying for what's right.  Hopefully they won't need to find out if a restore actually works between now and when they finally come to their senses.  My hat's off to you for your perseverance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply