Differential backup too big

  • Since two weeks, my differential backups are getting too big. I have a 4 TB database. Our backup schedule: full backup every week, differential backup every night, log backup every 5 minutes. The full backup is 800 GB (compressed). After a few days of small differential backups, suddenly the differential backups become around 700 GB. This has happened this week, and last week. This database is not very active. All transaction logs are around 200kB, and the amount of changed extents is just above a thousand (according to a script from Paul Randal), less than 0.01%. So I'd expect a tiny differential backup.

    There are two more things that are worth mentioning: the database is a subscriber in transactional replication, and CDC is used on these articles. Both replication and CDC jobs appear to be working fine.

    At this moment, I have no idea where to look. Can anybody help?

    Robert van den Berg

    Freelance DBA
    Author of:

  • Was the TR and CDC in place for a good period of time before the large DIF problem occurred?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • 1) What is the exact script you used to identify changed extents?

    2) What changed between the period where DIFFs were always small and the recent issues?

    3) What activity was done the day/night that preceded the huge DIFF this week? Was last week the same scenario (small for one or two then huge on same day)?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Another good bit of info to get would be whether most of the data backed up for the large differentials is from the data files or the log file.

    Something like the following query should point that out:

    SELECT bs.database_name,

    bs.backup_start_date,

    backup_MB=SUM(bf.backed_up_page_count)/128,

    bf.file_type

    FROM msdb.dbo.backupset bs

    INNER JOIN

    msdb.dbo.backupfile bf ON bs.backup_set_id=bf.backup_set_id

    WHERE bs.type='I' --differential backups

    AND

    bs.database_name='your_database_name'

    GROUP BY bs.database_name,

    bs.backup_start_date,

    bf.file_type

    ORDER BY bs.backup_start_date DESC,

    backup_MB DESC

    Cheers!

  • @jeff: yes, the TR and CDC were in place several weeks before we've been having this problem.

    @kevin:

    1) What is the exact script you used to identify changed extents?

    http://www.sqlskills.com/blogs/paul/new-script-how-much-of-the-database-has-changed-since-the-last-full-backup/

    2) What changed between the period where DIFFs were always small and the recent issues?

    The backup share filled up. Otherwise, I have no idea. All the "usual suspects" claim they did nothing, and I have no logging to suggest otherwise.

    3) What activity was done the day/night that preceded the huge DIFF this week? Was last week the same scenario (small for one or two then huge on same day)?

    The only thing I know of, is the database reorganisation. But since this finished in zero seconds, and didn't cause the transaction log backups to grow above 200 kB, I'm guessing that's not it.

    @jacob: thanks for the script, I'll run that tomorrow morning.

    Robert van den Berg

    Freelance DBA
    Author of:

  • Robert vd Berg (10/17/2016)


    @Jeff: yes, the TR and CDC were in place several weeks before we've been having this problem.

    @kevin:

    1) What is the exact script you used to identify changed extents?

    http://www.sqlskills.com/blogs/paul/new-script-how-much-of-the-database-has-changed-since-the-last-full-backup/

    2) What changed between the period where DIFFs were always small and the recent issues?

    The backup share filled up. Otherwise, I have no idea. All the "usual suspects" claim they did nothing, and I have no logging to suggest otherwise.

    3) What activity was done the day/night that preceded the huge DIFF this week? Was last week the same scenario (small for one or two then huge on same day)?

    The only thing I know of, is the database reorganisation. But since this finished in zero seconds, and didn't cause the transaction log backups to grow above 200 kB, I'm guessing that's not it.

    @jacob: thanks for the script, I'll run that tomorrow morning.

    When posting links, please use the [ url] and [ /url] (no spaces inside the square brackets) to make the link "clickable".

    http://www.sqlskills.com/blogs/paul/new-script-how-much-of-the-database-has-changed-since-the-last-full-backup/

  • Thinking outside the box:

    1) Got filestream by any chance?

    2) In-Memory OLTP?

    3) Columnstore indexes?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hello Kevin,

    "No" to all three.

    Greetings,

    Robert

    Robert van den Berg

    Freelance DBA
    Author of:

  • Index reorgs / rebuilds?

  • Robert vd Berg (10/17/2016)


    Hello Kevin,

    "No" to all three.

    Greetings,

    Robert

    Next guess is perhaps the algorithm Paul posted in 2008 is off for 2014 databases for some reason. I do note he specifically mentions large databases as a potential issue.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Right now, I'm thinking that either someone made a code change that updates more of the database than before or someone "got the religion" about index maintenance or similar. I also think that the code that Jacob Wilkins posted above might give you a clue about that. Have you tried running his code? And are you sure that the weekly backup actually took place? Lot's of folks have backup code that won't fail if something goes wrong with a particular database backup but (hopefully) does log the failure. You might want to check MSDB for backup failures.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I ran the code from Jacob. This is the result for last nights backup:

    backup_start_datebackup_MBfile_type

    2016-10-17 21:40:38.0001723566.125000D

    2016-10-17 21:40:38.0001.609375L

    That is what I would expect, since the log file is practically empty, and all log backups are small (these small log backups are also why I have no doubt about the script from Paul Randal).

    @jeff: yes, I did double check that we in fact do have a full backup, both on disk and using the following code:

    SELECTdatabase_name, type, backup_finish_date, backup_size

    FROMmsdb..backupset

    WHEREbackup_finish_date > '2016-10-01'

    and database_name = 'my database'

    and type <> 'L';

    My next move will be to make a native backup (instead of using Red Gate). I doubt this will solve the problem, but it's worth a try.

    Robert van den Berg

    Freelance DBA
    Author of:

  • I found the cause, so I thought I'd let you know. Between the small diffs and the larger diffs was a failed full backup (due to the backup share being full). I replicated that, by forcing a full backup and, while that was running, killing the backup service. The next diff is currently running, and it will be large, even though the number of changed extents was lower after the crashed full backup than before.

    Robert van den Berg

    Freelance DBA
    Author of:

  • Robert vd Berg (10/19/2016)


    I found the cause, so I thought I'd let you know. Between the small diffs and the larger diffs was a failed full backup (due to the backup share being full). I replicated that, by forcing a full backup and, while that was running, killing the backup service. The next diff is currently running, and it will be large, even though the number of changed extents was lower after the crashed full backup than before.

    Please open a ticket with Microsoft about this, or at least report it on Connect. I wonder if it should work that way and/or if there is something that can be done to make it not do that. I suspect not, but can't hurt to ask. At first blush it sounds like all the bit-mapped DIFF pages are wiped clean at the start of the FULL backup but I would think perhaps this shouldn't be done until the very last thing to avoid this very scenario.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks for that update!

    From looking at the log it looks like it probably is something along the lines of what Kevin suggested.

    For both failed and successful full backups, there's a transaction "Backup:InvalidateDiffMaps" that occurs very early in the backup, and Paul Randal suggests at http://www.sqlskills.com/blogs/paul/debunking-a-couple-of-myths-around-full-database-backups/ is the clearing of the DCM, which makes sense.

    The failed backup just seems to be missing the commit of the differential base, and some writes to the file headers and boot pages, which is expected.

    It seems then that SQL Server knows it can't rely on the diff maps for subsequent differentials, and does differentials differently in that case.

    I can reproduce the increased size easily enough by cancelling a full backup and then running a diff.

    Differential backups after a failed full also switch to processing every page in the DB, even though they doesn't necessarily include all of them in the backup.

    This behavior was also pointed out in the blog post here: https://sqlscope.wordpress.com/2014/12/06/failed-full-backup-can-impact-next-differential-backup/

    It definitely seems like something intentional on the part of MS, but it would be nice if it were better (at all) documented.

    Cheers!

    EDIT: Added what detail I was able to get from the log.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply