Transaction Logs in Status 2 for no apparent reason

  • I've got a SQL 2012 Server in a 3 Node Always On Configuration. Version: 11.0.5058, so SP2. We are finding 2 of the (31) databases regularly have their transaction logs blow out, with all VLFs in Status 2, but can't find the cause.

    We have done the following checks.
    1)      Regular Transaction log backups - every 30 minutes using CommVault
    2)      Always On is showing as synchronised
    3)      There's no Mirroring or CDC
    4)      log_reuse_wait_desc indicates "Log Backup", even after a log backup has occurred. (Done by CommVault)
    5)      Removing one of the databases from the AG seems to have fixed the problem for that DB. Further testing is still needed. This is not a long term solution.

    Some odd things we have found, but that don't seem likely to cause an issue:
    1)      CommVault is being used - I've got no control of this.
    2)      All The DBs that don't have issues only getting one daily full and 30 minute Tran Log backups.
    3)      One problem DB is getting hourly full and 1/2 hourly log backups
    4)      The other problem DB appears to have backups to 2 virtual files, as we would do to improve backup times.

    Does anyone have any other things we can look into? Are there any know issues like this? I've found one current, but not very helpful article an the web: VLF Status = 2

    Cheers
    Leo

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • Are the log backups being done with no_truncate or copy_only options?

    If you run a checkpoint after the log backup, what does the log_reuse_wait change to (it shows the thing that last prevented VLFs from being marked reusable)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Leo.Miller - Sunday, August 20, 2017 7:52 PM

    1)      Regular Transaction log backups - every 30 minutes using CommVault

    30 mins may not be enough for a busy database

    Leo.Miller - Sunday, August 20, 2017 7:52 PM


    3)      There's no Mirroring

    And at the core, that's exactly what Availability Groups are 😉

    Leo.Miller - Sunday, August 20, 2017 7:52 PM


    4)      log_reuse_wait_desc indicates "Log Backup", even after a log backup has occurred. (Done by CommVault)

    is it taking a log backup and truncating the log?

    Leo.Miller - Sunday, August 20, 2017 7:52 PM


    5)      Removing one of the databases from the AG seems to have fixed the problem for that DB.

    Indicates that log usage due to a mirror session is a possible issue

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • All good points and I'm aware of these. e.g. Mirroring is the core of AoHA. Just covering all the bases. Log backups take <1 second and at most 5 seconds, so I can't see a problem with these being at 30 minute intervals. I prefer more frequent log backups myself, but this is the SysAdmin's call.

    We were looking to see if the AoHA (mirroring) was causing the problem, but the log_reuse_wait_desc doesn't show this. The AoHA status shows as being Synchronised and testing show no outstanding synchronised updates. Updates on the primary are visible on the secondary within milliseconds. 

    No, no_truncate and/or copy_only are not being used.

    This is a live system so the testing we can do is limited. We can't leave the DBs in the AG without some risk, but every time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently). Hence we are still looking for a chance to check what doing a manual checkpoint shows up.

    All the databases are using the same routines, same backup config via CommVault, same AoHA group, etc. but only two of the DBs are showing this odd behaviour, MailMarshal and LANSweeper.

    Thanks for the ideas so far.
    Leo

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • Leo.Miller - Tuesday, August 22, 2017 6:15 PM

    but every time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently).

    And he'd be correct, you don't need to take a new full backup, just pull the existing full and any logs to reinitialise the db.
    Use the following to monitor send rate and size, redo rate and size

    SELECT ag.name AS [AG Name], ar.replica_server_name, ar.availability_mode_desc, adc.[database_name],

    drs.synchronization_state_desc, drs.is_commit_participant,

    drs.synchronization_health_desc, drs.last_sent_time, drs.last_received_time,

    drs.last_hardened_time, drs.last_redone_time, drs.log_send_queue_size,

    drs.log_send_rate, drs.redo_queue_size, drs.redo_rate, drs.last_commit_time, drs.database_state_desc

    FROM sys.dm_hadr_database_replica_states AS drs

    INNER JOIN sys.availability_databases_cluster AS adc

    ON drs.group_id = adc.group_id

    AND drs.group_database_id = adc.group_database_id

    INNER JOIN sys.availability_groups AS ag

    ON ag.group_id = drs.group_id

    INNER JOIN sys.availability_replicas AS ar

    ON drs.group_id = ar.group_id

    AND drs.replica_id = ar.replica_id

    WHERE drs.is_local <> 1

    ORDER BY ag.name, ar.replica_server_name, adc.[database_name]

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle - Wednesday, August 23, 2017 3:52 AM

    Leo.Miller - Tuesday, August 22, 2017 6:15 PM

    but every time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently).

    And he'd be correct, you don't need to take a new full backup, just pull the existing full and any logs to reinitialise the db.
    Use the following to monitor send rate and size, redo rate and size

    Easier said than done. We don't have access to the CommVault backups and getting the SysAdmin to do this for us is a nightmare. One of the joys of being a consultant.

    I've attached the file. Names have been changed, but the Primary is 02 and the secondary is 03, the problem DB is App02. The other one that was causing a problem isn't in the AG at the moment.

    The only thing I can see as a potential problem is that the Redo_Rate looks a bit low, but this is quite a bit lower than the send rate for all databases, so this may be normal. I've also compared other AoAG servers for this client and these show similar numbers, so they don't look like an issue.

    I'm interested in hearing what you think.

    Leo
    Nothing in life is ever so complicated that with a little effort it cant be made more complicated.

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • Leo.Miller - Tuesday, August 22, 2017 6:15 PM

     every time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently). 

    I don't think this is a great attitude on his part, and I wouldn't be deferring to it if it were me. You are charged with keeping your Production environment working, and he should respect that. (Anyway, you could use copy-only backups (Full and Log) to re-synch the AGs.) 

    Also, have you double and triple-checked that his Commvault log backups are definitely not copy-only, by checking in msdb.dbo.backupset (there's a column called is_copy_only)? Sorry if you've done that already, but I wasn't sure if you're having to go on what your sysadmin tells you.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply