LSN is broken and I'm unsure why..

  • To start with.. I'm relatively new to being a DBA.

    I have a backup plan that is the following in a test environment: Weekly full on Sunday at Midnight, Daily Diff at Midnight, and 30 minute log backups every day.  

    I set up a job that checks these log backups that basically creates a restore script in powershell runs a restore header only on the latest full, latest diff ( if its newer than the latest full) and any log backups that are newer than the latest diff or latest full depending on which is newer.  It populates this information into a table then compares the results to verify that the LSN chain stays intact.  

    This job ran fine on Thursday, Friday, Saturday until Sunday at 2:30 in the morning.  So this means, it took the Full on Sunday night, ran fine for 2 hours, then broke at 2:30 inexplicably.  I checked event viewer, and didn't see anything of note there.  There are no other backup jobs that run at that time.  There is a clean up job to clean up old backups but that runs the same time as the full.. so there shouldn't have been any file back ups at that time, this is also a test server that is pretty much used just for DBA testing, so no other users should have been in the box at the that time to delete or copy the backup.  There was no recovery model change or snapshot restores at this time.  I unfortunately don't have a copy of the LSNs or the restore script that was produced by this because it is overwritten every time the jobs run and a full was taken earlier that made the job start working again.

    I'm posting basically because I want to know what other causes there could be for the LSN chain to break?

  • oogibah - Monday, December 3, 2018 12:03 PM

    To start with.. I'm relatively new to being a DBA.

    I have a backup plan that is the following in a test environment: Weekly full on Sunday at Midnight, Daily Diff at Midnight, and 30 minute log backups every day.  

    I set up a job that checks these log backups that basically creates a restore script in powershell runs a restore header only on the latest full, latest diff ( if its newer than the latest full) and any log backups that are newer than the latest diff or latest full depending on which is newer.  It populates this information into a table then compares the results to verify that the LSN chain stays intact.  

    This job ran fine on Thursday, Friday, Saturday until Sunday at 2:30 in the morning.  So this means, it took the Full on Sunday night, ran fine for 2 hours, then broke at 2:30 inexplicably.  I checked event viewer, and didn't see anything of note there.  There are no other backup jobs that run at that time.  There is a clean up job to clean up old backups but that runs the same time as the full.. so there shouldn't have been any file back ups at that time, this is also a test server that is pretty much used just for DBA testing, so no other users should have been in the box at the that time to delete or copy the backup.  There was no recovery model change or snapshot restores at this time.  I unfortunately don't have a copy of the LSNs or the restore script that was produced by this because it is overwritten every time the jobs run and a full was taken earlier that made the job start working again.

    I'm posting basically because I want to know what other causes there could be for the LSN chain to break?

    Guessing, because I did the same thing, but here goes...
    The full backup runs at 12 midnight, and takes 1.5 hours to run. 
    The log backup kicks off at 12:30, runs again at 1:00, 1:30, and 2:00.
    You process only picks up the log backup that occurred at 2, missing the log backups from 12:30, 1, and 1:30

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Another backup was taken - could be a VM snapshot was taken at that time which generated a new base for the differential which would be different than what you are recording.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Your guess makes a lot of sense, but does not apply here, these databases are all very small databases, the full back up takes less than 10 minutes to complete.
    Thank you for the quick response :).

  • Jeffrey Williams 3188 - Monday, December 3, 2018 12:23 PM

    Another backup was taken - could be a VM snapshot was taken at that time which generated a new base for the differential which would be different than what you are recording.

    So we don't do snapshots specifically for our VMs, but there are backups taken within that timeframe, that is an interesting hurdle to jump through though given that the vms are backed up in chunks at a time not per server so they can vary greatly depending on the time of the backup..  Good insight, I'll look into this.

  • oogibah - Monday, December 3, 2018 12:28 PM

    So we don't do snapshots specifically for our VMs, but there are backups taken within that timeframe, that is an interesting hurdle to jump through though given that the vms are backed up in chunks at a time not per server so they can vary greatly depending on the time of the backup..  Good insight, I'll look into this.

    VM backups perform a freeze and thaw in SQL Server and then perform a snap of the VM - you can review the SQL Server logs to see if that is occurred.  If you are backing up your SQL Server VM's this way - then you cannot utilize differential backups - or, you have to modify your native full backup as a copy-only backup and use the VM snapshot as the differential base (which isn't practical because you cannot restore and leave the database in a state to apply differential or log backups).

    The better option is to remove dedicated SQL Server VM's from the VM backups and rely on other methods to get the native backup files off that system (which you should be doing anyways as you need those files available somewhere else if your VM becomes corrupted).

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams 3188 - Monday, December 3, 2018 12:37 PM

    VM backups perform a freeze and thaw in SQL Server and then perform a snap of the VM - you can review the SQL Server logs to see if that is occurred.  If you are backing up your SQL Server VM's this way - then you cannot utilize differential backups - or, you have to modify your native full backup as a copy-only backup and use the VM snapshot as the differential base (which isn't practical because you cannot restore and leave the database in a state to apply differential or log backups).

    The better option is to remove dedicated SQL Server VM's from the VM backups and rely on other methods to get the native backup files off that system (which you should be doing anyways as you need those files available somewhere else if your VM becomes corrupted).

    Thanks for the information, do you know if VMware diffs will also cause the LSN chain to break or if it is just the fulls.. because he takes an incremental every day but it worked fine the previous 2 days.

  • Full or Diff backups do not affect the log chain - they are independent from backups.  Full backups set the differential base LSN - and differentials are tied to the previous full backup based on the differential base LSN.

    The restore path from a full backup can be either:

    1) Full Backup and all transaction log backups from the full through the point in time to be recovered
    2) Full Backup - Differential Backup - and all transaction log backups from the differential through the point in time

    For #2 - it doesn't matter which differential backup is restored, what is important is that you have all transaction log backups from that point in time through the point in time being recovered.

    If the VM incrementals were backing up SQL Server - that would generate a new full backup - and all following differentials would be based on that full backup.  The transaction log backups are not affected by the VM backup and can be used with any full backup as long as you have an unbroken chain of transaction log backup files from that full through the point in time being recovered.

    As for your process - I believe you are capturing data that you already have logged in MSDB.  It really isn't necessary...what you should be concerned with is making sure you have all files available to recover from the full backup to any specified point in time following that full backup.  For example, what would you need to be able to restore your system to Wednesday, 11/28 @ 6am?

    And just to make it more interesting - you find out that the differential backup file taken that morning has been corrupted and cannot be used.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • in addition to what Jeff said, here is how you can find out which files are needed.

    -- choose database or currently used database will be selected
    DECLARE @db_name VARCHAR(100)
    SELECT @db_name = DB_NAME()

    -- Get Backup History for required database

    SELECT
    s.server_name,
    s.database_name,
    CASE s.[type]
    WHEN 'D' THEN 'Full'
    WHEN 'I' THEN 'Differential'
    WHEN 'L' THEN 'Transaction Log'
    END as BackupType,
    s.user_name,
    s.backup_start_date,
    m.physical_device_name,
    cast(CAST(s.backup_size / 1000000 AS INT) as varchar(14))
    + ' ' + 'MB' as bkSize,
    CAST(DATEDIFF(second, s.backup_start_date,
    s.backup_finish_date) AS VARCHAR(4)) + ' '
    + 'Seconds' TimeTaken,
    CAST(s.first_lsn AS varchar(50)) AS first_lsn,
    CAST(s.last_lsn AS varchar(50)) AS last_lsn,
    s.recovery_model
    FROM msdb.dbo.backupset s
    inner join msdb.dbo.backupmediafamily m ON s.media_set_id = m.media_set_id
    WHERE s.database_name = @db_name
    ORDER BY backup_start_date desc,
    backup_finish_date

  • So, I verified that it does seem to be the differential of the VMware snapshots that was causing the issue ... we are going to be moving just to doing standard backup of the back up drive and that should solve the issues.

    Thanks for your help guys.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply