Backup has stopped working

  • I have a stored procedure that has done my backups automatically for years. It was originally set up for the free version of 2008 R2, since I didn't have funds for a paid version. I eventually went to a paid version, and now am on 2014, but the script has worked fine for years, so I have continued to use it. It uses a Windows scheduled task to fire a VBScript routine, which then calls a SQL Server stored procedure.

     

    That procedure has suddenly stopped working, and I have no idea why.

     

    The core of the procedure is a call that finds the most recent data and time there was any sort of change in the database. There are several databases on the instance, and the VBScript routine calls the stored procedure for each database that I want to back up. (Some are static and no longer being developed, but only read, so those are removed from the routine backup procedures.)

     

    There are three types of backup this routine does – transaction log on 15, 30 and 45 minutes of an hour, differential on every hour, and full every night at 11 pm. Each run looks at the last changed date/time for the addressed database, and only does the backup if there has been a change (of any sort) to the database in the last interval – 15 minutes, hour, or day, depending on which sort is currently being requested. The reason for that is that I save each backup as a separate file, compress it with WinZip and save it off to a specific folder for backups, and I don't want to create lots of backups of no change. Weekends and nights typically have zero activity, and I don't want to store that. A little complex, but again, all has worked flawlessly for years.

     

    Suddenly, the core query has stopped returning what I consider proper results. This core query is here:

    SELECT Max(IsNull(last_user_update,0)) FROM sys.dm_db_index_usage_stats WHERE database_id = DB_ID(@DBName)

     

    The field database_id is supposed to have the id of the database, but it suddenly does not. The entire thing only has 16 rows, all of which are under ONE database_id number, the ID of the built-in MSDB database. There is nothing there from any of the actual databases there, so naturally, the stored procedure does not work, since all is dependent on it retrieving the correct date from here. The databases are in daily use, so there is activity going on.

     

    What can be causing this?

    • This topic was modified 2 years, 7 months ago by  pdanes.
    • This topic was modified 2 years, 7 months ago by  pdanes.
  • To me it looks like whatever process you have for populating the parameter (@DBName) is busted.

    Now as to why it isn't working, you would need to look into what populates that parameter.  If you are pulling it from sys.databases (for example), run the query as the same user who is running the backups and verify that you get the results you expect.

    My guess is that it is permission related somewhere OR some issues on the database or instance (such as database was detached or dropped).

    My opinion though - I would do a TLOG backup and differential and full even if no data had changed.  Without seeing all of your code, I am just guessing that the "data changed" applies to the previous backup with conditions.  What I mean is that if the TLOG backup has no data changed since the last TLOG, Differential, or full backup, then you skip that TLOG backup.  BUT with differential (for example), it is looking for data changed since the last differential and full is looking for data changed since the last full.  To me this sounds like a lot of extra overhead in an effort to save disk space.  This also adds a bit of overhead onto your restore process.

    But, if you are not looking to change the backup process, I would be looking at what populates that parameter as to me it sounds like it is no longer being populated properly.  BUT that is just a guess.

    The other thing you could try is to run that query with a SELECT TOP 5 * or even a SELECT DISTINCT database_id and check that things are being logged in there as you expect.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Of course lots of changes could be made to a db that would not be reflected in sys.dm_db_index_usage_stats.  For example, file(s) added; user(s) added/removed; permissions added/removed; etc..

    The most likely causes: variable @DBName contains an unexpected value (as noted by Brian) AND/OR the user/context running the code somehow only has permission to see the msdb db.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • No, the database name is correct. From the VBScript routine, it is passed as a string literal, and those have not changed, nor has the name of any of the databases that I wish to back up.

     

    When I execute this manually from SSMS:

    SELECT * FROM sys.dm_db_index_usage_stats

    I get only the 16 rows I mentioned in the original post - 16 rows TOTAL, with NO conditional clause. There is NOTHING there from any of the active databases.

     

    As for other things changing, I also have some similar selects that look for structure changes, and use the most recent of any of the retrieved dates to test for 'changed' status in deciding whether or not to perform a backup. I excluded those from the description of the problem for the sake of simplicity, and since they are  not germane to the issue, whish is that sys.dm_db_index_usage_stats is practically empty - the core of the problem.

  • I know this sounds a bit stupid but I can't see what you see.  My first check would be to see if the "missing" databases actually exist any more.  My second check would be to see if some privs had changed.  My third check would be to verify that everything was actually pointing to the correct instance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well, now I feel like a complete dork. I re-connected to the server this morning, which was still logged into my account from last week, and still  had SSMS open with the queries I was using to examine the environment. The databases are all there, of course. I am not quite that senile yet, that I would not notice a database gone. There are only six in this instance, and the critical ones are in daily use. A missing database would not go long unnoticed. But I ran the simple query cited above:

    select * from sys.dm_db_index_usage_stats

    intending to post a screenshot of the SSMS window showing the databases and the sparse population of the resulting dataset, but instead of 16 rows, I now have 52, and some of them have the ID of the databases I want to back up. And the popuation is increasing - just ran it again and the count is now up to 98, with an additional database showing up now that people have come in to work.

    I have no idea what happened - there have been unintended server re-starts recently (it's an old machine and starting to show some wear), but they have happened before and not impacted the backup routines. And these recent re-starts do not correlate in time with the backup failures.

    I appreciate the suggestions, but I think I'd like to table this post for now. I have some investigating to do and no real clue on exactly where to start, so there is no point in tying up bandwidth here when I have nothing useful to post.

    Thank you all for the ideas - I'll post back here when/if I make some headway on understanding what is happening.

    • This reply was modified 2 years, 7 months ago by  pdanes.
  • Been there and done that.  Totally understood.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • So here's a period at the end of the chapter, although a rather unsatisfying one.

    I played with the T-SQL routine that does the actual backup, tidying up the syntax, adding some formatting to print statements that execute when the code is executed with a special Debug parameter, changing a few variable names and similar busy work, preparing to see if I could track down exactly what is going on in the routine. Somewhere along the way it started working again.

    The scheduled task has been active all along, but the routine always reported that there were no changes since the last backup and quit. Suddenly, for no reason that I can see, it has started reporting changes discovered during the previous interval and executing the backup procedure. I have been working with the live version of the procedure, since it was not working anyway – I wasn't worried about disrupting a functioning process.

    The sys. views are now showing recent activity, as they should, changes are getting detected, and backups are being created and stored in their proper place.

    I have no idea why the stored procedure suddenly stopped working, nor why the sys. views did not show recent activity, nor what I did to the procedure to make it suddenly start working again, nor why the sys. views suddenly have proper information again. I didn't think I changed anything of substance in the procedure – mostly tidying up formatting in preparation for actual debugging, and I certainly didn't do anything that would stop or start the databases logging their activities – I wouldn't even know how.

    But it works, and I have other things to do, so I'm going to chalk this up to database gremlins and get on with my life. Not the ideal outcome, but sometimes you just have to accept that the world can be a bit capricious.

    • This reply was modified 2 years, 7 months ago by  pdanes.
  • Interesting.  Thank you for taking the time to post the feedback.

    Out of curiosity, when is the last time the SQL Server Service was restarted?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Around two weeks ago - I don't recall exactly and I'm not at work right now. I'll check. There are also sporadic seeming power failures in this machine, and I have never been able to track down why. It is on a UPS, it has dual power supplies, both working, tested, swapped for replacements several times, but every once in a while, I login and discover that it has restarted recently. But the backup started failing well before the most recent restart. It was exactly such an unintended restart of the whole machine that prompted me to start digging around, and I just happened to notice the backups had suddenly stopped accumulating.

    That's one of the problems with automation - things are put in place, tested, they work, and I tend to forget about them. It's not the first automated system that has suddenly borked, usually because the IT department changes something and doesn't tell me.

    One virtual machine completely disappeared on me. There was some sort of upgrade to the OS, which caused a problem in startup. The virtual overstructure had no mechanism for detecting the problems inside one of its children, backups were made but older  ones were deleted, and by the time I noticed that the virtual machine was not runing, ALL functional backups had been superseded by newer, non-functional backups, and the entire configuration was toast. I had some rather choice words for the IT department that time, and they were apologetic, but the damage was done. Other things have gone MIA in similar fashion - they close a port on the domain controller, or re-map some IP addresses, and wait to see if anyone bitches. But when it impacts something automated, or used infrequently, it can be a long damn time before anyone notices a problem. I do not know what to do about that.

  • Seems crazy but it's gotten to the point for a whole lot of folks where they need a "conductor" for the "symphony of automation" they've constructed to make sure that everything is playing the right note at the right time because automation can report it's down if it's down.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • "...I eventually went to a paid version, and now am on 2014, ..."

    You could take the opportunity to set up a simpler backup strategy using native SQL. Some people don't like maintenance plans, but they are easy to understand and take just a few clicks to implement.

    • This reply was modified 2 years, 7 months ago by  homebrew01.
  • Jeff Moden wrote:

    Seems crazy but it's gotten to the point for a whole lot of folks where they need a "conductor" for the "symphony of automation" they've constructed to make sure that everything is playing the right note at the right time because automation can report it's down if it's down.

    I always like the "Wait, I didn't get an email telling me that it worked today" monitoring.

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Last re-start was just about a week ago. When and how many before that I don't know, but as I wrote earlier, this machine restarts occasionally for no reason that I can figure out. Going into the ILO logs it always shows power failure, but with it already on a UPS and dual power supplies, I don't know what to do about it, short of replacing the entire machine.

  • I've started looking at designing a system of mutual controls, set up on numerous systems, which would periodically interrogate each other and report to a central point when one of the members failed to respond properly. Supposedly there are already such monitoring systems available for purchase, but I could easily wait years before anything like that got approved  here.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply