Transaction Logs in Status 2 for no apparent reason

Question

Transaction Logs in Status 2 for no apparent reason

Leo.Miller

SSChampion

Points: 13070
More actions
August 20, 2017 at 7:52 pm

#323186

I've got a SQL 2012 Server in a 3 Node Always On Configuration. Version: 11.0.5058, so SP2. We are finding 2 of the (31) databases regularly have their transaction logs blow out, with all VLFs in Status 2, but can't find the cause.
We have done the following checks.
1)      Regular Transaction log backups - every 30 minutes using CommVault
2)      Always On is showing as synchronised
3)      There's no Mirroring or CDC
4)      log_reuse_wait_desc indicates "Log Backup", even after a log backup has occurred. (Done by CommVault)
5)      Removing one of the databases from the AG seems to have fixed the problem for that DB. Further testing is still needed. This is not a long term solution.
Some odd things we have found, but that don't seem likely to cause an issue:
1)      CommVault is being used - I've got no control of this.
2)      All The DBs that don't have issues only getting one daily full and 30 minute Tran Log backups.
3)      One problem DB is getting hourly full and 1/2 hourly log backups
4)      The other problem DB appears to have backups to 2 virtual files, as we would do to improve backup times.
Does anyone have any other things we can look into? Are there any know issues like this? I've found one current, but not very helpful article an the web: VLF Status = 2
Cheers
Leo
Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

Are the log backups being done with no_truncate or copy_only options?

If you run a checkpoint after the log backup, what does the log_reuse_wait change to (it shows the thing that last prevented VLFs from being marked reusable)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Perry Whittle SSC Guru Points: 234065 More actions · Answer 2

Leo.Miller - Sunday, August 20, 2017 7:52 PM
1) Regular Transaction log backups - every 30 minutes using CommVault

30 mins may not be enough for a busy database

Leo.Miller - Sunday, August 20, 2017 7:52 PM
3) There's no Mirroring

And at the core, that's exactly what Availability Groups are 😉

Leo.Miller - Sunday, August 20, 2017 7:52 PM
4) log_reuse_wait_desc indicates "Log Backup", even after a log backup has occurred. (Done by CommVault)

is it taking a log backup and truncating the log?

Leo.Miller - Sunday, August 20, 2017 7:52 PM
5) Removing one of the databases from the AG seems to have fixed the problem for that DB.

Indicates that log usage due to a mirror session is a possible issue

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

Leo.Miller SSChampion Points: 13070 More actions · Answer 3

All good points and I'm aware of these. e.g. Mirroring is the core of AoHA. Just covering all the bases. Log backups take <1 second and at most 5 seconds, so I can't see a problem with these being at 30 minute intervals. I prefer more frequent log backups myself, but this is the SysAdmin's call.

We were looking to see if the AoHA (mirroring) was causing the problem, but the log_reuse_wait_desc doesn't show this. The AoHA status shows as being Synchronised and testing show no outstanding synchronised updates. Updates on the primary are visible on the secondary within milliseconds.

No, no_truncate and/or copy_only are not being used.

This is a live system so the testing we can do is limited. We can't leave the DBs in the AG without some risk, but every time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently). Hence we are still looking for a chance to check what doing a manual checkpoint shows up.

All the databases are using the same routines, same backup config via CommVault, same AoHA group, etc. but only two of the DBs are showing this odd behaviour, MailMarshal and LANSweeper.

Thanks for the ideas so far.
Leo

Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Perry Whittle SSC Guru Points: 234065 More actions · Answer 4

Leo.Miller - Tuesday, August 22, 2017 6:15 PM
but every time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently).

And he'd be correct, you don't need to take a new full backup, just pull the existing full and any logs to reinitialise the db.
Use the following to monitor send rate and size, redo rate and size

SELECT ag.name AS [AG Name], ar.replica_server_name, ar.availability_mode_desc, adc.[database_name],

drs.synchronization_state_desc, drs.is_commit_participant,

drs.synchronization_health_desc, drs.last_sent_time, drs.last_received_time,

drs.last_hardened_time, drs.last_redone_time, drs.log_send_queue_size,

drs.log_send_rate, drs.redo_queue_size, drs.redo_rate, drs.last_commit_time, drs.database_state_desc

FROM sys.dm_hadr_database_replica_states AS drs

INNER JOIN sys.availability_databases_cluster AS adc

ON drs.group_id = adc.group_id

AND drs.group_database_id = adc.group_database_id

INNER JOIN sys.availability_groups AS ag

ON ag.group_id = drs.group_id

INNER JOIN sys.availability_replicas AS ar

ON drs.group_id = ar.group_id

AND drs.replica_id = ar.replica_id

WHERE drs.is_local <> 1

ORDER BY ag.name, ar.replica_server_name, adc.[database_name]

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

Leo.Miller SSChampion Points: 13070 More actions · Answer 5

Perry Whittle - Wednesday, August 23, 2017 3:52 AM
Leo.Miller - Tuesday, August 22, 2017 6:15 PM
but every time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently).
And he'd be correct, you don't need to take a new full backup, just pull the existing full and any logs to reinitialise the db.
Use the following to monitor send rate and size, redo rate and size

Easier said than done. We don't have access to the CommVault backups and getting the SysAdmin to do this for us is a nightmare. One of the joys of being a consultant.

I've attached the file. Names have been changed, but the Primary is 02 and the secondary is 03, the problem DB is App02. The other one that was causing a problem isn't in the AG at the moment.

The only thing I can see as a potential problem is that the Redo_Rate looks a bit low, but this is quite a bit lower than the send rate for all databases, so this may be normal. I've also compared other AoAG servers for this client and these show similar numbers, so they don't look like an issue.

I'm interested in hearing what you think.

Leo
Nothing in life is ever so complicated that with a little effort it cant be made more complicated.

Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Beatrix Kiddo SSC-Dedicated Points: 32407 More actions · Answer 6

Leo.Miller - Tuesday, August 22, 2017 6:15 PM
every time we add it back to test something the Sys Admin throws his toys out the cot because our full backup to establish the AG messes with his CommVault schedule (apparently).

I don't think this is a great attitude on his part, and I wouldn't be deferring to it if it were me. You are charged with keeping your Production environment working, and he should respect that. (Anyway, you could use copy-only backups (Full and Log) to re-synch the AGs.)

Also, have you double and triple-checked that his Commvault log backups are definitely not copy-only, by checking in msdb.dbo.backupset (there's a column called is_copy_only)? Sorry if you've done that already, but I wasn't sure if you're having to go on what your sysadmin tells you.